Contributing to the Future

The mission of the Web Application Security Working Group, part of the Security Activity, is to develop security and policy mechanisms to improve the security of Web Applications, and enable secure cross-origin communication.

If you work with websites, you are probably already familiar with the web features that the WebAppSec WG has recently introduced. But not many web developers (or even web security specialists) feel comfortable reading or referencing the specifications for these features.

I typically hear one of two things when I mention WebAppSec WG web features:

  1. Specifications are hard to read. I’d rather read up on this topic at [some other website].
  2. This feature does not really cover my use-case. I’ll just find a workaround.

Specifications are not always the best source if you are looking to get an introduction to a feature, but once you are familiar with it, the specification should be the first place you go when looking for a reference or clarification. And if you feel the language of the specification can be better or that more examples are needed, go ahead and propose the change!

To cover the second complaint, I’d like to detail out our experience contributing a feature to a WebAppSec WG specification. I hope to clarify the process and debunk the myth that your opinion is going to be unheard or that you can’t, as an individual, make a meaningful contribution to a web feature used by millions.

Background

Content Security Policy is a WebAppSec WG standard that allows a website author to, among other things, declare the list of endpoints with which a web page is expecting to communicate. Another great WebAppSec WG standard, SRI, allows a website author to ensure that the resources received by their web page (like scripts, images, and stylesheets) have the expected content. Together, these web features significantly reduce the risk that an attacker can substitute web page resources for their own malicious resources.

I helped standardise and implement require-sri-for, a new CSP directive that mandates SRI integrity metadata to be present before requesting any subresource of a given type.

Currently, SRI works with resources referenced by script and link HTML elements. Also, the Request interface of the Fetch API allows to specify the expected integrity metadata. For example, Content-Security-Policy: require-sri-for script style; extends the expectations and forbids pulling in any resources of a given type without integrity metadata.

Contributing to a WebAppSec WG Specification

Unlike some other working groups, there is no formal process on how to start contributing to W3C’s Web Application Security Working Group specifications, and it might look scary. It is actually not, and usually flows in the following order:

  1. A feature idea forms in somebody’s head enough to be expressed as a paragraph of text.
  2. A paragraph or two is proposed either in WebAppSec’s public mailing list or in the Github Issues section of the relevant specification. Ideally, examples, algorithms and corner-cases are included.
  3. After some discussion, which can sometimes take quite a while, the proposal starts to be formalised as a patch to the specification.
  4. The specification strategy is debated, wording details are finalised, and the patch lands in the specification.
  5. Browser vendors implement the feature.
  6. Websites start using the feature.

Anyone can participate in any phase of feature development, and I’ll go over the require-sri-fordevelopment timeline to highlight major phases and show how we participated in the process.

Implementing require-sri-for

  1. Development started in an issue on the WebAppSec GitHub repo opened back in April 2014 by Devdatta Akhawe. He wonders how one might describe a desire to require SRI metadata, e.g. the integrity hash, be present for all subresources of a given type.
  2. Much time passes. SRI is supported by Chrome and Firefox. Github is among first big websites to use it in the wild.
  3. A year later, Github folks raise the same question that Dev did on the public-webappsec mailing list, with an addition of having an actual use case: they intended to have an integrity attribute on every script they load, but days later after deplyoing the SRI feature, determined that they had missed one script file.
  4. Neil from Github Security starts writing up a paragraph in the CSP specification to cover a feature that would enforce integrity attribute on scripts and styles. Lots of discussion irons out the details that were not covered in the earlier email thread.
  5. I pick up the PR and move it to the SRI specification GitHub repo. 65 comments later, it lands in the SRI spec v2.
  6. Frederik Braun patches Firefox Nightly with require-sri-for implementation.
  7. I submit a PR to Chromium with basic implementation. 85 comments later, it evolves into a new Chrome platform feature and lands with plans to be released in Chrome 54.

Resources

Specification development is happening on Github, and there are many great specifications that you should be looking at:

We Are Hiring!

If you reached here, there is a chance that Shape has a career that looks interesting to you.

Salvation is Coming (to CSP)

CSP (Content Security Policy) is a W3C candidate recommendation for a policy language that can be used to declare content restrictions for web resources, commonly delivered through the Content-Security-Policy header. Serving a CSP policy helps to prevent exploitation of cross-site scripting (XSS) and other related vulnerabilities. CSP has wide browser support according to caniuse.com.

Content-Security-Policy-1.0

Content-Security-Policy-Level-2

There’s no downside to starting to use CSP today. Older browsers that do not recognise the header or future additions to the specification will safely ignore them, retaining the current website behaviour. Policies that use deprecated features will also continue to work, as the standard is being developed in a backward compatible way. Unfortunately, our results of scanning the Alexa top 50K websites for CSP headers align with other reports which show that only major web properties like Twitter, Dropbox, and Github have adopted CSP. Smaller properties are not as quick to do so, despite how relatively little effort is needed for a potentially significant security benefit. We would be happy to see CSP adoption grow among smaller websites.

Writing correct content security policies is not always straightforward, and mistakes make it into production. Browsers will not always tell you that you’ve made a typo in your policy. This can provide a false sense of security.

Announcing Salvation

Today, Shape Security is releasing Salvation, a FOSS general purpose Java library for working with Content Security Policy. Salvation can help with:

  • parsing CSP policies into an easy-to-use representation
  • answering questions about what a CSP policy allows or restricts
  • warning about nonsensical CSP policies and deprecated or nonstandard features
  • safely creating, manipulating, and merging CSP policies
  • rendering and optimising CSP policies

We created Salvation with the goal of being the easiest and most reliable standalone tool available for managing CSP policies. Using this library, you will not have to worry about tricky cases you might encounter when manipulating CSP policies. Working on this project helped us to identify several bugs in both the CSP specification and its implementation in browsers.

Try It Out In Your Browser

We have also released cspvalidator.org, which exposes a subset of Salvation’s features through a web interface. You can validate and inspect policies found on a public web page or given through text input. Additionally, you can try merging CSP policies using one of the two following strategies:

  • Intersection combines policies in such a way that the result will behave similar to how browsers enforce each policy individually. To better understand how it works, try to intersect default-src a b with default-src; script-src *; style-src c.
  • Union, which is useful when crafting a policy, starting with a restrictive policy and allowing each resource that is needed. See how union merging is not simply concatenation by merging script-src * with script-src a in the validator.

Contribute

You can check out the source code for Salvation on Github or start using it today by adding a dependency from Maven Central. We welcome contributions to this open source project.

Contributors

Detecting PhantomJS Based Visitors

These days, many web security incidents involve automation. Web-scraping, password reuse, and click-fraud attacks are perpetrated by adversaries trying to mimic real users, and thus will attempt to look like they are coming from a browser. As a website owner, you want to ensure you serve humans, and as a web service provider you want programmatic access to your content to go through your API instead of being scraped through your heavier and less stable web interface.

Assuming that you have basic checks for cURL-like visitors, the next reasonable step is to ensure that visitors are using real, UI-driven browsers — and not headless browsers like PhantomJS and SlimerJS.

In this article, we’re going to demonstrate some techniques for identifying visits by PhantomJS. We decided to focus on PhantomJS because it is the most popular headless browser environment, but many of the concepts that we’ll cover are applicable to SlimerJS and other tools.

NOTE: The techniques presented in this article are applicable to both PhantomJS 1.x and 2.x, unless explicitly mentioned. First up: is it possible to detect PhantomJS without even responding to it?

HTTP stack

As you may be aware, PhantomJS is built on the Qt framework. The way Qt implements the HTTP stack makes it stick out from other modern browsers.

First, let’s take a look at Chrome, which sends out the following headers:

GET / HTTP/1.1
Host: localhost:1337
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ru;q=0.6

In PhantomJS, however, the same HTTP request looks like this:

GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection: Keep-Alive
Accept-Encoding: gzip
Accept-Language: en-US,*
Host: localhost:1337

You’ll notice the PhantomJS headers are distinct from Chrome (and, as it turns out, all other modern browsers) in a few subtle ways:

  • The Host header appears last
  • The Connection header value is mixed case
  • The only Accept-Encoding value is gzip
  • The User-Agent contains “PhantomJS”

Checking for these HTTP header aberrations on the server, it should be possible to identify a PhantomJS browser.

But, is it safe to believe these values? If an adversary uses a proxy to rewrite headers in front of the headless browser, they could modify those headers to appear like a normal modern browser instead.

Looks like tackling this problem purely on the server is not a silver bullet. So let’s take a look at what can be done on the client, using PhantomJS’s JavaScript environment.

Client-side User-Agent Check

We may not be able to trust the User-Agent value as delivered via HTTP, but what about on the client?

if (/PhantomJS/.test(window.navigator.userAgent)) {
    console.log("PhantomJS environment detected.");
}

Unfortunately, it is similarly trivial to change user-agent header and navigator.userAgent values in PhantomJS, so this might not be enough.

Plugins

navigator.plugins contains an array of plugins that are present within the browser. Typical plugin values include Flash, ActiveX, support for Java applets, and the “Default Browser Helper”, which is a plugin that indicates whether this browser is the default browser in OS X. In our research, most fresh installs of common browsers include at least one default plugin — even on mobile.

This is unlike PhantomJS, which doesn’t implement any plugins, nor does it provide a way to add one (using the PhantomJS API).

The following check might then be useful:

if (!(navigator.plugins instanceof PluginArray) || navigator.plugins.length == 0) {
    console.log("PhantomJS environment detected.");
} else {
    console.log("PhantomJS environment not detected.");
}

On the other hand, it’s fairly trivial to spoof this plugin array by modifying the PhantomJS JavaScript environment before the page is loaded.

It’s also not difficult to imagine a custom build of PhantomJS with real, implemented plugins. This is easier than it sounds because the Qt framework on which PhantomJS is built provides a native API for implementing plugins.

Timing

Another point of interest is how PhantomJS suppresses JavaScript dialogs:

var start = Date.now();
alert('Press OK');
var elapse = Date.now() - start;
if (elapse < 15) {
    console.log("PhantomJS environment detected. #1");
} else {
    console.log("PhantomJS environment not detected.");
}

After measuring several times, it appears that if the alert dialog is suppressed within 15 milliseconds, the browser is probably not being controlled by a human. But using this approach means bothering real users with an alert they’ll manually have to close.

Global Properties

PhantomJS 1.x exposes two properties on the global object:

if (window.callPhantom || window._phantom) {
  console.log("PhantomJS environment detected.");
} else {
  console.log("PhantomJS environment not detected.");
}

However, these properties are part of an experimental feature and may change in the future.

Lack of JavaScript Engine Features

PhantomJS 1.x and 2.x currently use out-of-date WebKit engines, which means there are browser features that exist in newer browsers that do not exist in PhantomJS. This extends to the JavaScript engine — whereby some native properties and methods are different or absent in PhantomJS.

One such method is Function.prototype.bind, which is missing in PhantomJS 1.x and older. The following example checks whether bind is present, and that it has not been spoofed in the executing environment.

(function () {
  if (!Function.prototype.bind) {
    console.log("PhantomJS environment detected. #1");
    return;
  }
  if (Function.prototype.bind.toString().replace(/bind/g, 'Error') != Error.toString()) {
    console.log("PhantomJS environment detected. #2");
    return;
  }
  if (Function.prototype.toString.toString().replace(/toString/g, 'Error') != Error.toString()) {
    console.log("PhantomJS environment detected. #3");
    return;
  }
  console.log("PhantomJS environment not detected.");
})();

This code is a little too tricky to explain in detail here, but you can find out more from our presentation.

Stack Traces

Errors thrown by JavaScript code evaluated by PhantomJS via the evaluate command contain a uniquely identifiable stack trace, from which we can identify the headless browser.

For example, suppose that PhantomJS calls evaluate on the following code:

var err;
try {
  null[0]();
} catch (e) {
  err = e;
}
if (indexOfString(err.stack, 'phantomjs') > -1) {
  console.log("PhantomJS environment detected.");
} else {
  console.log("PhantomJS environment is not detected.");
}

Note that this example uses a custom indexOfString() function, left as an exercise for the reader, since the native String.prototype.indexOf can be spoofed by PhantomJS to always return a negative result.

Now, how do you get a PhantomJS script to evaluate this code? One technique is to override some frequently used DOM API functions that are likely to be called. For example, the code below overrides document.querySelectorAll to inspect the browser’s stack trace:

var html = document.querySelectorAll('html');
var oldQSA = document.querySelectorAll;
Document.prototype.querySelectorAll = Element.prototype.querySelectorAll = function () {
  var err;
  try {
    null[0]();
  } catch (e) {
    err = e;
  }
  if (indexOfString(err.stack, 'phantomjs') > -1) {
    return html;
  } else {
    return oldQSA.apply(this, arguments);
  }
};

Summary

In this article we’ve looked at 7 different techniques for identifying PhantomJS, both on the server and by executing code in PhantomJS’s client JavaScript environment. By combining the detection results with a strong feedback mechanism — for example, rendering a dynamic page inert, or invalidating the current session cookie — you can introduce a solid hurdle for PhantomJS visitors. Always keep in mind however, that these techniques are not infallible, and a sophisticated adversary will get through eventually.

To learn more, we recommend watching this recording of our presentation from AppSec USA 2014 (slides). We’ve also put together a GitHub repository of example implementations — and possible circumventions — of the techniques presented here.

Thanks for reading, and happy hunting.

Contributors