Gartner Identifies Shape Security as New Deflection Technique

Avivah Litan
Avivah Litan, Gartner Research VP and Distinguished Analyst, highlights Shape Security her latest blog post.
To read more about her analysis on solutions for automated attacks, read below.

Gartner Research VP and Distinguished Analyst, Avivah Litan, mentioned Shape Security on her blog discussing the growing threat of automated attacks on websites. Shape Security has been mentioned in multiple other reports. The difference here is this blog that is publicly available for everyone (including those without a Gartner subscription).

According to Avivah:

[Shape Security is a] new web application security technique that scrambles website code using a process called polymorphism. This precludes the hackers’ ability to decipher how a web site can be attacked since the logic of the web application is no longer transparent (e.g. no more ‘in the clear’ HTML code).

In her blog, Avivah features Shape Security as a solution to these automated attacks. Specifically, she states that Shape’s polymorphic technology deflects malicious automation, preventing the attacks from executing at the point of entry. Deflection is better than detection – preventing attack is better that finding the attacker ex post facto.

Interested to learn more? Learn more here.

Detecting PhantomJS Based Visitors

These days, many web security incidents involve automation. Web-scraping, password reuse, and click-fraud attacks are perpetrated by adversaries trying to mimic real users, and thus will attempt to look like they are coming from a browser. As a website owner, you want to ensure you serve humans, and as a web service provider you want programmatic access to your content to go through your API instead of being scraped through your heavier and less stable web interface.

Assuming that you have basic checks for cURL-like visitors, the next reasonable step is to ensure that visitors are using real, UI-driven browsers — and not headless browsers like PhantomJS and SlimerJS.

In this article, we’re going to demonstrate some techniques for identifying visits by PhantomJS. We decided to focus on PhantomJS because it is the most popular headless browser environment, but many of the concepts that we’ll cover are applicable to SlimerJS and other tools.

NOTE: The techniques presented in this article are applicable to both PhantomJS 1.x and 2.x, unless explicitly mentioned. First up: is it possible to detect PhantomJS without even responding to it?

HTTP stack

As you may be aware, PhantomJS is built on the Qt framework. The way Qt implements the HTTP stack makes it stick out from other modern browsers.

First, let’s take a look at Chrome, which sends out the following headers:

GET / HTTP/1.1
Host: localhost:1337
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ru;q=0.6

In PhantomJS, however, the same HTTP request looks like this:

GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection: Keep-Alive
Accept-Encoding: gzip
Accept-Language: en-US,*
Host: localhost:1337

You’ll notice the PhantomJS headers are distinct from Chrome (and, as it turns out, all other modern browsers) in a few subtle ways:

  • The Host header appears last
  • The Connection header value is mixed case
  • The only Accept-Encoding value is gzip
  • The User-Agent contains “PhantomJS”

Checking for these HTTP header aberrations on the server, it should be possible to identify a PhantomJS browser.

But, is it safe to believe these values? If an adversary uses a proxy to rewrite headers in front of the headless browser, they could modify those headers to appear like a normal modern browser instead.

Looks like tackling this problem purely on the server is not a silver bullet. So let’s take a look at what can be done on the client, using PhantomJS’s JavaScript environment.

Client-side User-Agent Check

We may not be able to trust the User-Agent value as delivered via HTTP, but what about on the client?

if (/PhantomJS/.test(window.navigator.userAgent)) {
    console.log("PhantomJS environment detected.");
}

Unfortunately, it is similarly trivial to change user-agent header and navigator.userAgent values in PhantomJS, so this might not be enough.

Plugins

navigator.plugins contains an array of plugins that are present within the browser. Typical plugin values include Flash, ActiveX, support for Java applets, and the “Default Browser Helper”, which is a plugin that indicates whether this browser is the default browser in OS X. In our research, most fresh installs of common browsers include at least one default plugin — even on mobile.

This is unlike PhantomJS, which doesn’t implement any plugins, nor does it provide a way to add one (using the PhantomJS API).

The following check might then be useful:

if (!(navigator.plugins instanceof PluginArray) || navigator.plugins.length == 0) {
    console.log("PhantomJS environment detected.");
} else {
    console.log("PhantomJS environment not detected.");
}

On the other hand, it’s fairly trivial to spoof this plugin array by modifying the PhantomJS JavaScript environment before the page is loaded.

It’s also not difficult to imagine a custom build of PhantomJS with real, implemented plugins. This is easier than it sounds because the Qt framework on which PhantomJS is built provides a native API for implementing plugins.

Timing

Another point of interest is how PhantomJS suppresses JavaScript dialogs:

var start = Date.now();
alert('Press OK');
var elapse = Date.now() - start;
if (elapse < 15) {
    console.log("PhantomJS environment detected. #1");
} else {
    console.log("PhantomJS environment not detected.");
}

After measuring several times, it appears that if the alert dialog is suppressed within 15 milliseconds, the browser is probably not being controlled by a human. But using this approach means bothering real users with an alert they’ll manually have to close.

Global Properties

PhantomJS 1.x exposes two properties on the global object:

if (window.callPhantom || window._phantom) {
  console.log("PhantomJS environment detected.");
} else {
  console.log("PhantomJS environment not detected.");
}

However, these properties are part of an experimental feature and may change in the future.

Lack of JavaScript Engine Features

PhantomJS 1.x and 2.x currently use out-of-date WebKit engines, which means there are browser features that exist in newer browsers that do not exist in PhantomJS. This extends to the JavaScript engine — whereby some native properties and methods are different or absent in PhantomJS.

One such method is Function.prototype.bind, which is missing in PhantomJS 1.x and older. The following example checks whether bind is present, and that it has not been spoofed in the executing environment.

(function () {
  if (!Function.prototype.bind) {
    console.log("PhantomJS environment detected. #1");
    return;
  }
  if (Function.prototype.bind.toString().replace(/bind/g, 'Error') != Error.toString()) {
    console.log("PhantomJS environment detected. #2");
    return;
  }
  if (Function.prototype.toString.toString().replace(/toString/g, 'Error') != Error.toString()) {
    console.log("PhantomJS environment detected. #3");
    return;
  }
  console.log("PhantomJS environment not detected.");
})();

This code is a little too tricky to explain in detail here, but you can find out more from our presentation.

Stack Traces

Errors thrown by JavaScript code evaluated by PhantomJS via the evaluate command contain a uniquely identifiable stack trace, from which we can identify the headless browser.

For example, suppose that PhantomJS calls evaluate on the following code:

var err;
try {
  null[0]();
} catch (e) {
  err = e;
}
if (indexOfString(err.stack, 'phantomjs') > -1) {
  console.log("PhantomJS environment detected.");
} else {
  console.log("PhantomJS environment is not detected.");
}

Note that this example uses a custom indexOfString() function, left as an exercise for the reader, since the native String.prototype.indexOf can be spoofed by PhantomJS to always return a negative result.

Now, how do you get a PhantomJS script to evaluate this code? One technique is to override some frequently used DOM API functions that are likely to be called. For example, the code below overrides document.querySelectorAll to inspect the browser’s stack trace:

var html = document.querySelectorAll('html');
var oldQSA = document.querySelectorAll;
Document.prototype.querySelectorAll = Element.prototype.querySelectorAll = function () {
  var err;
  try {
    null[0]();
  } catch (e) {
    err = e;
  }
  if (indexOfString(err.stack, 'phantomjs') > -1) {
    return html;
  } else {
    return oldQSA.apply(this, arguments);
  }
};

Summary

In this article we’ve looked at 7 different techniques for identifying PhantomJS, both on the server and by executing code in PhantomJS’s client JavaScript environment. By combining the detection results with a strong feedback mechanism — for example, rendering a dynamic page inert, or invalidating the current session cookie — you can introduce a solid hurdle for PhantomJS visitors. Always keep in mind however, that these techniques are not infallible, and a sophisticated adversary will get through eventually.

To learn more, we recommend watching this recording of our presentation from AppSec USA 2014 (slides). We’ve also put together a GitHub repository of example implementations — and possible circumventions — of the techniques presented here.

Thanks for reading, and happy hunting.

Contributors

A Technical Comparison of the Shift and SpiderMonkey AST Formats

Since publishing our announcement of the Shift AST specification, many developers have asked for more details about how the Shift AST format compares to the SpiderMonkey AST format. We should first enumerate what we consider to be the properties of a good AST format.

A good AST format…

  • minimizes the number of inhabitants that do not represent a program.
  • is at least partially homogenous to allow for a simple and efficient visitor.
  • does not impede moving, copying, or replacing subtrees.
  • discourages duplication in code that operates on it.

Improvements

The following is a list of differences that we consider improvements over the SpiderMonkey AST format.

  • The top-level node returned from any successful parse is named Script, not Program, to ease upgrade to ECMAScript 6. ECMAScript 6 parsers need two modes: one mode that produces a Script and one mode that produces a Module. Modules allow import/export declarations at the top level and are always in strict mode.
  • Functions (including getters/setters) represent their body using a FunctionBody, not a BlockStatement, to support directives and because a function’s body is neither a generic statement position nor a block.
  • Script contains a FunctionBody, not a [Statement], to support top-level directives and for uniform handling of the shared FunctionBody structure.
  • The concepts of BlockStatement and Block have been separated. A BlockStatement contains a Block, not a [Statement]. A Block contains a [Statement]. This is better for transformation: a BlockStatement may be replaced by any other Statement, but a Block must be replaced only by another Block. Block is also used to represent the body and finalizer of a TryFinallyStatement and body of a CatchClause (all of which cannot be arbitrary statements).
  • Similarly, the concepts of VariableDeclarationStatement and VariableDeclaration have been separated. A VariableDeclaration is used within for and for-in statements (both of which cannot contain arbitrary statements in that position).
  • The VariableDeclaration declarators list is required to be non-empty.
  • The concepts of IdentifierExpression and Identifier have been separated. An IdentifierExpression contains an Identifier in expression position. Identifiers are also used for function names, break labels, and static member access property names.
  • MemberExpression has been separated into StaticMemberExpression and ComputedMemberExpression so that the computed flag and the type of property cannot conflict.
  • SwitchStatementWithDefault has been separated out of SwitchStatement to guarantee that a SwitchStatement has no more than one default clause.
  • TryStatement has been separated into TryFinallyStatement (try/catch/finally and try/finally) and TryCatchStatement (try/catch) to disallow a TryStatement with no handler and no finalizer.
  • SequenceExpression and LogicalExpression are just BinaryExpressions. AssignmentExpression remains separate in preparation for ECMAScript 6, where its left operand will need to be a Binding.
  • Separated Literal into LiteralBooleanExpression, LiteralNullExpression, LiteralNumericExpression, LiteralRegExpExpression, and LiteralStringExpression. The SpiderMonkey Literal node is overloaded to the point that it is not used anywhere without qualifying that only a subset of its values may be used.
  • LiteralRegExpExpression is represented by a string, not a RegExp. This allows for JSON serialization and a simpler equivalence definition.
  • Property has been separated into Getter, Setter, and DataProperty, each of which have a PropertyName. PropertyName has a kind (“identifier”, “string”, or “number”) and string value. FunctionExpressions are much too permissive to represent getters/setters, and the Property kind could conflict with the value.
  • Added UseStrictDirective and UnknownDirective nodes to represent directives. These nodes will be replaced with a single Directive node in the future.
  • Removed support for SpiderMonkey-specific language extensions (expression closures, multiple catch clauses, for-each-in, etc.) other than block-scoped declarations.

Insignificant Differences

The following is a list of differences that we believe are insignificant improvements over the SpiderMonkey AST format.

  • SourceLocation format. Source position information was not originally part of the Shift specification because it was not important for any of Shape Security’s usages. Support for source position tracking was only recently added with this experimental interface. If a use case for tracking end position without source content is identified, that information may be added to SourceLocation.
  • Names. We tried to be internally consistent with names like binding, value, and body. We made no effort to carry over SpiderMonkey naming conventions.
  • UpdateExpression and UnaryExpression are replaced by PrefixExpression and PostfixExpression. Ignoring the fact that the prefix flag on SpiderMonkey’s UnaryExpression is unnecessary, there are pros and cons to each way this set of operations is grouped. For example, during scope analysis, it is easier to group the increment/decrement operators together to generate write references, but during code generation it is easier to group the prefix/postfix operators separately.

Deviations From ECMAScript 5

The following is a list of intentional supported extensions to ECMAScript 5.

  • VariableDeclarationKind contains let and const, which should only be allowed in ECMAScript 6, but popular implementations had widespread support for these declaration kinds long before they had support for any other ECMAScript 6 feature. Because of this, many people consider them to be an unofficial extension to ECMAScript 5.
  • Similarly, FunctionDeclarations in arbitrary Statement position were allowed by many ECMAScript 5 implementations (with varying semantics), so the Declaration interface was removed, and FunctionDeclaration was moved to Statement.

Remaining Problems

The following is a list of restrictions that must be checked in addition to the structural correctness to ensure that a Shift AST represents an ECMAScript program. Ideally, this list would be as small as possible, but because of the context sensitivity inherent in the design of the ECMAScript language, these additional restrictions are either impossible or infeasible to enforce in the AST structure. The reason we want this list to be small is because each program that operates on a Shift format AST needs to either be aware of all of these restrictions and handle them gracefully or require consumers to guarantee that the input AST is valid.

Luckily for developers, at the time of the initial Shift specification announcement we made available the Shift validator, which both validates and enumerates validation errors for a given Shift format AST. This makes it very easy to ensure that a Shift format AST does not include any of the below listed problems, as well as debugging problems when they are detected.

  • BreakStatement without a label must be nested within an IterationStatement, SwitchCase, or SwitchDefault. BreakStatement with a label must be nested within a LabeledStatement with an equivalent label (without crossing a function boundary).
  • ContinueStatement without a label must be nested within an IterationStatement. ContinueStatement with a label must be nested within an IterationStatement that is labeled with an equivalent label (without crossing a function boundary).
  • LiteralRegExpExpression value must represent a valid RegExp.
  • Identifier name must always be an IdentifierName, and must not be a reserved word in any position other than a StaticMemberExpression property.
  • An IfStatement with alternate must not have another IfStatement without alternate nested within its consequent in a way that does not represent a valid program. See isProblematicIfStatement in estools/esutils for more details.
  • LabeledStatement must not be nested within a LabeledStatement with the same label name.
  • LiteralNumericExpression value must be non-negative, finite, and non-NaN.
  • ObjectExpression cannot contain data/getter or data/setter properties with the same name.
  • ObjectExpression cannot contain more than one data property with the name `__proto__`.
  • A PropertyName with kind of “number” can have a non-numeric value, and a PropertyName with a kind of “identifier” can have a non-IdentifierName value. It is possible this may one day be fixed.
  • ReturnStatement must be nested within a FunctionExpression, FunctionDeclaration, Getter, or Setter.
  • VariableDeclaration as ForInStatement left must have exactly one declarator. This can (and likely will) be fixed.
  • In strict mode (in other words, nested within a FunctionBody that has a UseStrictDirective), function names, function parameters, catch bindings, setter parameters, prefix/postfix increment/decrement operands, assignment bindings, and variable declaration bindings must not be restricted words (arguments or eval).
  • In strict mode, function parameters must be unique within their containing parameter list.
  • In strict mode, Identifier name must not be a future reserved word in expression position.
  • In strict mode, a PrefixExpression must not have both a delete operator and an Identifier operand.
  • In strict mode, WithStatement is disallowed.

Hopefully this has cleared up the questions you had about the Shift AST. If you think of anything we haven’t, or if you have additional questions, leave a comment below so that others may benefit from the discussion.

Edit: A previous version of this post did not distinguish BreakStatement/ContinueStatement nodes with a label from those without.

Announcing the Shift JavaScript AST Specification

In time for the holidays, we are happy to release Shape Security’s first open source contributions: a new JavaScript AST
specification named Shift, and a suite of tools to help you get started working with it.

What is an AST?

An Abstract Syntax Tree is simply a tree representation of a program’s source code. The nodes in an AST represent individual aspects of the language such as identifiers, statements, and literals. This structure is commonly the result of a successful parse of source code.

What can I do with it?

Having an easy to use data structure that represents a program’s source code allows you to write programs that treat code as they would any other piece of data. You can reliably generate new source, transform between languages, replace subtrees, analyze, lint, and auto-format code. ASTs are used by anything that needs to operate on code: IDEs, parsers, linters, analyzers, optimizers, compilers, and more. AST formats that are publicly standardized enable developers to centralize their efforts over a common structure, reducing duplicate work and allowing tools to be composed together.

This doesn’t exist already?

Mozilla exposed the SpiderMonkey Reflect.parse API in 2010 to encourage better tooling for JavaScript. This proved to be incredibly useful to the JavaScript community, enabling the creation of parsers like Esprima and Acorn and catalyzing a vast ecosystem of tools. Hundreds of projects rely upon these tools, including eslint, plato, istanbul, jscs,
browserify, and many more.

However, the SpiderMonkey AST format was not specifically created for these tools. The SpiderMonkey AST originated as the internal representation of a JavaScript program in the SpiderMonkey engine, which was intended to be used only for interpretation. As tools were created and more use cases for a standard AST were recognized, many difficulties in dealing with SpiderMonkey format ASTs surfaced.

The SpiderMonkey AST and its ecosystem of tools and parsers is formidable and we don’t take deviation lightly. Our work at Shape Security has presented us with many problems that involve deep analysis and transformation of JavaScript. We have been forced to rethink what it means to represent and transform a JavaScript program, and in doing so developed this alternative AST format. The main advantages of using the Shift AST format are that it makes it much more difficult to accidentally perform a transformation that creates an invalid AST, and the nodes align more closely to the syntactic features they represent.

More than just the AST

An AST specification doesn’t have much value without a surrounding ecosystem. We’ve open-sourced JavaScript and Java implementations of the foundational tooling necessary to foster development of a supporting ecosystem around the Shift AST format. The following tools have been made available for both environments.

  • AST Node Constructors
  • Parser
  • Code Generator
  • Reducer
  • Validator
  • Scope Analyzer

In addition, we’ve released a tool for converting back and forth between the Shift and SpiderMonkey AST formats. All of these are available on the Shape Security Github account.

The road forward

We will continue to develop tooling based on the Shift AST format and will iterate on the existing libraries, optimize for performance, and add ECMAScript 6 support.

The Shift AST format was developed with ECMAScript 6 in mind. The es6 branches of both the specification and the JavaScript AST constructors already include full support for ECMAScript 6, and we plan to add support to all of the tooling we have released so far. Contributors

Some of the developers behind the Shift AST format and associated tools are active contributors and maintainers of JavaScript language tools that are popular in the JavaScript community. Work on those tools is not ending, nor does the work here immediately affect any future plans for those tools.

Shape Security Named a 2014 Cool Vendor by Gartner

Gartner designated Shape amongst new and innovative vendors in the application and endpoint security space. Gartner, the renowned technology research firm, recognized Shape for its impactful and intriguing application and endpoint security innovation for 2014.

The list can be found in the Gartner “Cool Vendors in Application and Endpoint Security, 2014” May 2, 2014 report.

Shape Security’s flagship product, the ShapeShifter, is the industry’s first botwall to defend websites against the most sophisticated automated threats — offering a comprehensive defense against major website attacks, including account takeover, Man-in-the-Browser and application DDoS attacks. By continuously refactoring the HTML, CSS and JavaScript which implement a website’s user interface, the ShapeShifter is able to automatically deflect sophisticated attacks against websites. By robbing cybercriminals of their ability to use malware for their attacks, Shape radically shifts the economics of web hacking.