World Kill the Password Day

This World Password Day, let’s examine why the world has not yet managed to kill the password.

Today is World Password Day. It’s also Star Wars Day, which will get far more attention from far more people (May the Fourth be with you). It also happens to be National Orange Juice Day. And a few other days. This confusion is appropriate for World Password Day, because while the occasion is about improving password habits, the world has turned decidedly against passwords. Headlines from the past few years demonstrate a consistent stream of invective toward them:

2013: “PayPal and Apple Want to Kill Your Password
2014: “Inside Twitter’s ambitious plan to kill the password
2015: “White House goal: Kill the password
2016: “Google aims to kill passwords by the end of this year
2017: “Facebook wants to kill the password

And yet, not one of these efforts has succeeded in “killing the password”—as we can see from the fact that every major online service still requires them.

Why is this the case? To explore this question, it is useful to first examine the function that passwords serve. Online applications must ensure that only authorized users are able to access their data or functionality. In order to do this, the application requires some form of proof that the user who is accessing the application is who they say they are. Passwords are a “shared secret” between the authorized user and the application, and if the user accessing the application demonstrates they know this secret, the application assumes that they are the authorized user. Unfortunately, unauthorized users may learn this shared secret, through various types of attacks, so passwords simply do not provide a good proof of identity. And yet, the password continues to be the universal method of online authentication.

So what about all of the technologies that have gained popularity in recent years, like two-factor authentication using mobile devices and fingerprint scanners? Let’s take a look at some of these alternatives and why they haven’t been able to replace passwords.

Standard biometrics, like fingerprint and iris-based authentication, are convenient in that you always have them available on your person, but you obviously cannot change them. Soft biometrics, like voice and typing pattern analysis, are similar convenient, but have too much variation to be used for anything but negative authentication. Hard and soft tokens, in the form of dedicated hardware or personal mobile devices, are inconvenient to access and often difficult to use. And finally, device-based authentication is also only suitable for negative authentication, since users use multiple devices or may lose their authorized device.

There are some common benefits and drawbacks of these approaches which start to appear. This is because every system for authentication fits into the well-known framework of:

1. Something you know (such as a password)
2. Something you have (such as a mobile phone)
3. Something you are (such as a fingerprint)

The problem is that each part of this framework has different strengths and weaknesses. “Something you know” is convenient and changeable, but it can also be stolen easily, especially if copied somewhere and stored insecurely. “Something you have” is harder to steal, but is also not always with you. And “Something you are” is always available to you, but the description of what you are (say, a scan of your iris) cannot be changed if stolen from an insecure service that stored it. What this means is that the only true replacement for passwords will come from a mechanism that offers the same benefits as “something you know”, and yet somehow addresses its drawbacks.

Security challenge questions: the worst second factor

Some systems have tried to use security challenge questions as an additional authentication factor, especially for password recovery, but these are one of the worst developments in online security. Their problem is that they combine the drawbacks of passwords (answers can be stolen through data breaches), with the drawbacks of biometrics (you can’t change your mother’s maiden name or the street where you grew up), and add their own unique drawbacks (answers can be guessed through social media). Most security professionals now enter random information into such security challenge questions, but that effectively creates additional passwords, which offer no benefit over a single, strong password, except for use as a backup password.

But there is a more fundamental conflict which underpins our continued reliance on passwords: the fact that security and convenience are usually at odds. Moving toward three-factor authentication (one factor from each category), using a combination of something like a password, a soft token, and biometrics, one can create a relatively secure authentication mechanism, but this is much less convenient for most users.

Users value convenience over security (yet still expect security)

For many years, the public has been learning of the need for everyone to select strong passwords. But most people still don’t. Recently, because of the Yahoo and other data breaches, the public started to learn that even if they select strong passwords, they should never reuse them across sites. But most people still do. Password managers aren’t silver bullets, and are subject to their own vulnerabilities, but their widespread use would dramatically improve both of the above issues. Unfortunately, most people don’t use them. Multi-factor authentication, specifically two-factor authentication using mobile phones, is now offered on most major online services. While everyone should enable it, most people won’t, due to the difficulty of use or the lack of convenience.

Security professionals and other security-conscious users are getting more and more options, but the average person continues to value convenience and ease of use above all else, and would like security to simply be provided for them automatically. They don’t want to have to take responsibility for preventing their online bank account from being hacked—they want the bank to take care of that.

In fact, since users will quickly abandon services that are too difficult to use, online services focus much more on improving usability than on security. This is illustrated by a step back in security that technology companies have taken over the years, by standardizing on the use of email addresses as usernames. In the past, you could set a unique username for each account, making it far more difficult for cybercriminals to gain access to your account on one service by stealing your credentials from another. But since remembering both usernames and passwords was hard for users, and online services needed users’ email addresses anyway, they have collectively chosen to consolidate the username and email address into a single identifier. This, of course, has fuelled credential stuffing attacks and automated fraud across all major online services, leveraging billions of spilled credentials through attack tools like Sentry MBA.

The future includes more passwords, for now

The reason that we still have passwords is because we as users continue to demand their advantages, and haven’t come up with anything that preserves those while addressing their drawbacks. Similar to Winston Churchill’s observation on democracy, we might say that passwords are the worst form of authentication—except for all the others that have been tried.

While users are becoming more security conscious, and are learning to accept the friction of multi-factor authentication for the benefit of security, a sea change in user behavior isn’t happening anytime soon. This shifts the burden for security and fraud protection back to online service providers. Given the constraint of delivering a friction-free experience to their users, they are now investing in layered, invisible security mechanisms. These mechanisms allow them to provide the benefits of passwords with defense against their drawbacks, by doing things such as detecting when stolen passwords are used (as recommended by NIST) or protecting against credential stuffing attacks.

It’s World Password Day. While technologies like Apple’s Touch ID afford us great conveniences, and may eventually result in many people being able to bypass re-entering their passwords much of the time, they do not replace those passwords. We’re not “killing” the password anytime soon, so this May 4th, let’s make sure we continue to promote good password practices.

2017 Credential Spill Report

social_media_10largest_spillsOver the past 12 months, we have seen dozens of the world’s largest online services report that they had been breached by attackers who were able to gain access to their customers’ login credential data. By the end of 2016, over three billion credentials in total were reported stolen, at an average pace of one new credential spill reported every week.

These numbers are a record and include the two largest reported credential spills of all time, both by Yahoo. Near the end of the year, the National Institute of Standards and Technology published the Draft NIST Special Publication 800-63B Digital Identity Guidelines, recommending that online account systems check their users’ passwords against known spilled credential lists.

As the size and frequency of credential spills appears to be increasing, today we are publishing the 2017 Credential Spill Report. This report includes key findings from the credential spills reported in the past year and data from the Shape network to provide insight into the scale of credential theft and how stolen credentials are used.

In particular, stolen credentials are now used every day in credential stuffing attacks on all major online services. In these attacks, cybercriminals test for the reuse of passwords across websites and mobile applications. In the past, announcements of credential spills would focus on the security of accounts at the organization which reported the data breach, but now people are realizing that the widespread reuse of passwords by users across websites means that a breach on one account system endangers all other account systems.

At Shape, we have a unique view into this activity because our technology protects the world’s most attacked web and mobile applications—those run by the largest corporations in financial services, retail, travel, and other industries, as well as the largest government agencies—on a 24/7 basis.

Key statistics from spills reported in the past year include:

Over 3 billion credentials were reported stolen in 2016.

  • 51 companies reported suffering a breach where user credentials were stolen.
  • Yahoo in 2016 reported the two largest credential spills of all time. The next largest credential spills in 2016 were reported by Friend Finder, MySpace, Badoo and LinkedIn.
  • Tech companies had the largest total number of spilled credentials (1.75 billion).
  • The gaming industry had the largest number of companies with spills (11).

From Shape’s network data, we also observed:

  • 90% of login requests on many of the world’s largest web and mobile applications is attributable to traffic from credential stuffing attacks.
  • There is up to a 2% success rate for account takeover from credential stuffing attacks, meaning that cybercriminals are taking over millions of accounts across the Internet on a daily basis as a result of credential spills.
  • Credential stuffing attacks are now the single largest source of account takeover on most major websites and mobile applications.
  • One Fortune 100 retailer experienced a credential stuffing attack with over 10,000 login attempts in one day coming from the cybercriminal attack tool Sentry MBA, which is the most popular credential stuffing software and appears to be used to attack nearly every company in every industry.
  • Analyzing 15.5M account login attempts for one customer during a four month period, over 500K accounts were confirmed to be on publicly spilled credential lists.

Dealing with credential spills and the credential stuffing attacks that they fuel is a complex topic. Here are some basic recommended actions for consumers and enterprises:

The most important takeaway for consumers is that you should never reuse passwords across online accounts. Selecting a strong password is not enough; if you have reused that same password on multiple sites, and one of those sites is breached, your accounts on all of the other sites where you have used the same password are now at risk.

For companies, a lot of public attention is focused on any organization that experiences a data breach and loses control of their users’ credentials. However, the real issue other companies should focus on is protecting themselves against those passwords being used to attack them and their own users. Credential stuffing attacks easily bypass simple security controls like CAPTCHA and Web Application Firewalls, so relying on those mechanisms does not offer any protection. Controls like two-factor authentication can help, but of course come with other drawbacks.

In any case, getting educated is the best course of action. The Open Web Application Security Project (OWASP) provides a starting point for learning about credential stuffing and other automated attacks in their list of OWASP Automated Threats To Web Applications.

To learn more, download the full 2017 Credential Spill Report.

Dan Woods,

Director, Shape Intelligence Center

Shift Semantics

A few months ago, Shape released Shift Semantics, the newest tool in the Shift family of ECMAScript tooling.

The Shift Semantics library defines a structure for representing the behaviour of an ECMAScript program as well as a function for deriving one of these structures from a Shift AST.

Background

While ECMAScript has many syntactic features, not every one of them maps to a unique behaviour. For instance, all static member accesses (a.b) can be represented in terms of computed member accesses on strings (a['b']), and all for loops can be written in terms of while loops. Shift Semantics defines a data structure called an abstract semantic graph (ASG) which has a node for each individual operation that an ECMAScript interpreter would need to be able to interpret all ECMAScript programs. When an ASG is created, information about the original syntactic representation is lost. Only the important bit about what that program is supposed to do remains.

Why is this useful?

Many use cases for ECMAScript tooling do not involve knowledge of the original representation of the program. For instance, if one is compiling an ECMAScript program to another lower-level language, it is convenient to define a compilation once for all loops instead of repeating much of the same code for each looping construct in the language. This greatly reduces the number of structures you have to consider for your transformation/analysis and eliminates any need for code duplication.

What does it look like?

There are 52 ASG nodes. Many of them have an obvious meaning: Loop, MemberAccess, GlobalReference, LiteralNumber, etc. Others are not as obvious. Keys is an Object.keys analogue, used to represent the for-in behaviour of retrieiving enumerable keys of an object before looping over them. RequireObjectCoercible represents the internal spec operation of the same name. Halt explicitly indicates that the program has run to completion, though it’s also allowed to be used in other places within the graph if one desires.

Update: If you’d like to see GraphViz visualisations, see the examples in Pull Request #8 or run the visualiser yourself on a program of your choosing.

How do I use it?

If you don’t already have a Shift AST for your program, you will need to use the parser to create one:

import com.shapesecurity.shift.parser.Parser;

String js = "f(0);";

// if you have a script
Script script = Parser.parseScript(js);
// if you have a module
Module module = Parser.parseModule(js);

Once you have your Shift AST, pass it to the Explicator:

import com.shapesecurity.shift.semantics.Explicator;

Semantics semantics = Explicator.deriveSemantics(program);

The Semantics object’s fields, including the ASG, can be accessed directly. One can also define a reducer, in the same way a reducer is defined over a Shift AST: subclassing the ReconstructingReducer. Note that our ASG reducers have not yet been open-sourced, but will be soon.

Limitations

Currently, WithStatements and direct calls to eval are explicated into Halt nodes. There’s no reason that these cannot be supported, but they were not part of the initial effort. Similarly, not all of ECMAScript 2015 and beyond is supported. We will be adding support for newer ECMAScript features piece by piece as development continues.

Acknowledgements

Thanks to Shape engineer Kevin Gibbons for figuring out all the hard problems during the design of the ASG.

The Right to Buy Tickets

Young people waiting in line to buy tickets in NewYork.

With President Obama’s signing of the Better Online Ticket Sales (BOTS) Act of 2016 and the passing of recent legislation in New York, there are signs of hope that beginning in 2017, humans may once again have a fighting chance of purchasing a ticket to a hot concert, show or event.

It took ticket prices reaching $1000 per head for the award-winning Broadway show “Hamilton”, to force action against ticket bots getting the best seats in the house. Lin-Manuel Miranda who created and stars in Hamilton wrote a compelling Op-Ed in the New York Times in June 2016 entitled “Stop the Bots from Killing Broadway.” Finally, in December New York Gov. Cuomo passed a bill to make ticket bot purchases illegal. As one of the founding fathers of the US Constitution, it seems that Hamilton would have approved of an amendment that protected “the right to buy tickets.”

So how did ticket bots get control over the ticket purchases? The cybercriminal ecosystem has evolved over the past few years to make it easier to launch automated attacks on web and mobile apps with the purpose of stealing assets. In the case of ticket bots, automated scripts running on rented botnets enable the immediate and rapid purchase of tickets to popular events once they go on sale. Humans don’t have a chance against a machine intent on purchasing tickets. Until now.

With the recently passed ticket bot legislation, it is officially illegal to use ticket bots with the purpose of automated purchasing. Now ticket sellers  are protected against fraud by state fines and possible jail time as a deterrent.  With this new legislation, ticket sellers must also tighten up their defenses so that they can prevent the use of ticket bots proactively. Just stating that the use of automation and ticket bots is not allowed will no longer be sufficient as a defense.

Enforcing this legislation will have some challenges given the number of parties involved in automated ticket purchases. The illegal ticket reseller is in many cases at the outer edge of a cybercriminal ecosystem that is rapidly building out infrastructure and services on the Dark Web. In addition to automated ticket purchases, automated credential stuffing attacks for account takeover and malicious content scraping are affecting retail, travel and ecommerce businesses. The threat of fines and possible jail time for ticket bots will hopefully go some way to drying up some of the demand for cybercriminal automation.

Shows such as Hamilton were created for humans to enjoy, and at Shape Security we believe consumers shouldn’t have to fight bots to get a ticket. Every day at Shape Security we help major companies defend against automated attacks by bots, and we applaud this new legislation outlawing ticket bots.

Contributing to the Future

The mission of the Web Application Security Working Group, part of the Security Activity, is to develop security and policy mechanisms to improve the security of Web Applications, and enable secure cross-origin communication.

If you work with websites, you are probably already familiar with the web features that the WebAppSec WG has recently introduced. But not many web developers (or even web security specialists) feel comfortable reading or referencing the specifications for these features.

I typically hear one of two things when I mention WebAppSec WG web features:

  1. Specifications are hard to read. I’d rather read up on this topic at [some other website].
  2. This feature does not really cover my use-case. I’ll just find a workaround.

Specifications are not always the best source if you are looking to get an introduction to a feature, but once you are familiar with it, the specification should be the first place you go when looking for a reference or clarification. And if you feel the language of the specification can be better or that more examples are needed, go ahead and propose the change!

To cover the second complaint, I’d like to detail out our experience contributing a feature to a WebAppSec WG specification. I hope to clarify the process and debunk the myth that your opinion is going to be unheard or that you can’t, as an individual, make a meaningful contribution to a web feature used by millions.

Background

Content Security Policy is a WebAppSec WG standard that allows a website author to, among other things, declare the list of endpoints with which a web page is expecting to communicate. Another great WebAppSec WG standard, SRI, allows a website author to ensure that the resources received by their web page (like scripts, images, and stylesheets) have the expected content. Together, these web features significantly reduce the risk that an attacker can substitute web page resources for their own malicious resources.

I helped standardise and implement require-sri-for, a new CSP directive that mandates SRI integrity metadata to be present before requesting any subresource of a given type.

Currently, SRI works with resources referenced by script and link HTML elements. Also, the Request interface of the Fetch API allows to specify the expected integrity metadata. For example, Content-Security-Policy: require-sri-for script style; extends the expectations and forbids pulling in any resources of a given type without integrity metadata.

Contributing to a WebAppSec WG Specification

Unlike some other working groups, there is no formal process on how to start contributing to W3C’s Web Application Security Working Group specifications, and it might look scary. It is actually not, and usually flows in the following order:

  1. A feature idea forms in somebody’s head enough to be expressed as a paragraph of text.
  2. A paragraph or two is proposed either in WebAppSec’s public mailing list or in the Github Issues section of the relevant specification. Ideally, examples, algorithms and corner-cases are included.
  3. After some discussion, which can sometimes take quite a while, the proposal starts to be formalised as a patch to the specification.
  4. The specification strategy is debated, wording details are finalised, and the patch lands in the specification.
  5. Browser vendors implement the feature.
  6. Websites start using the feature.

Anyone can participate in any phase of feature development, and I’ll go over the require-sri-fordevelopment timeline to highlight major phases and show how we participated in the process.

Implementing require-sri-for

  1. Development started in an issue on the WebAppSec GitHub repo opened back in April 2014 by Devdatta Akhawe. He wonders how one might describe a desire to require SRI metadata, e.g. the integrity hash, be present for all subresources of a given type.
  2. Much time passes. SRI is supported by Chrome and Firefox. Github is among first big websites to use it in the wild.
  3. A year later, Github folks raise the same question that Dev did on the public-webappsec mailing list, with an addition of having an actual use case: they intended to have an integrity attribute on every script they load, but days later after deplyoing the SRI feature, determined that they had missed one script file.
  4. Neil from Github Security starts writing up a paragraph in the CSP specification to cover a feature that would enforce integrity attribute on scripts and styles. Lots of discussion irons out the details that were not covered in the earlier email thread.
  5. I pick up the PR and move it to the SRI specification GitHub repo. 65 comments later, it lands in the SRI spec v2.
  6. Frederik Braun patches Firefox Nightly with require-sri-for implementation.
  7. I submit a PR to Chromium with basic implementation. 85 comments later, it evolves into a new Chrome platform feature and lands with plans to be released in Chrome 54.

Resources

Specification development is happening on Github, and there are many great specifications that you should be looking at:

We Are Hiring!

If you reached here, there is a chance that Shape has a career that looks interesting to you.

Announcing SuperPack

Shape Security is proud to announce the release of SuperPack, a language-agnostic schemaless binary data serialisation format.

First of all, what does it mean to be schemaless?

Data serialisation formats like JSON or MessagePack encode values in a way that the structure of those values (schema) can be determined by simply observing the encoded value. These formats, like SuperPack, are said to be “schemaless”.

In contrast, a schema-driven serialisation format such as Protocol Buffers makes use of ahead-of-time knowledge of the schema to pack the encoded values into one exteremely efficent byte sequence free of any schema markers. Schema-driven encodings have some obvious downsides. The schema must remain fixed (ignoring versioning), and if the encoding party is not also the decoding party, the schema must be shared among them and kept in sync.

Choose the right tool for the job. Usually, it is better to choose a schema-driven format if it is both possible and convenient. For other occasions, we have a variety of schemaless encodings.

What separates it from the others?

In short, SuperPack payloads are very compact without losing the ability to represent any type of data you desire.

Extensibility

The major differentiator between SuperPack and JSON or bencode is that it is extensible. Almost everyone has had to deal with JSON and its very limited set of data types. When you try to JSON serialise a JS undefined value, a regular expression, a date, a typed array, or countless other more exotic data types through JSON, your JSON encoder will either give you an error or give you an encoding that will not decode back to the input value. You will never have that problem with SuperPack.

SuperPack doesn’t have a very rich set of built-in data types. Instead, it is extensible. Say we wanted to encode/decode (aka transcode) regular expressions, a data type that is not natively supported by SuperPack. This is all you have to do:

SuperPackTranscoder.extend(
  // extension point: 0 through 127
  0,
  // detect values which require this custom serialisation
  x => x instanceof RegExp,
  // serialiser: return an intermediate value which will be encoded instead
  r => [r.pattern, r.flags],
  // deserialiser: from the intermediate value, reconstruct the original value
  ([pattern, flags]) => RegExp(pattern, flags),
);

And if we want to transcode TypedArrays:

SuperPackTranscoder.extend(
  1,
  ArrayBuffer.isView,
  a => [a[Symbol.toStringTag], a.buffer],
  ([ctor, buffer]) => new self[ctor](buffer),
);

Compactness

The philosophy behind SuperPack is that, even if you cannot predict your data’s schema in advance, the data likely has structures or values that are repeated many times in a single payload. Also, some values are just very common and should have efficient representations.

Numbers between -15 and 63 (inclusive) are a single byte; so are booleans, null, undefined, empty arrays, empty maps, and empty strings. Strings which don’t contain a null (\0) character can avoid storing their length by using a C-style null terminator. Boolean-valued arrays and maps use a single bit per value.

When an encoder sees multiple strings with the same value, it will store them in a lookup table, and each reference will only be an additional two bytes. Note that this string deduplication optimisation could have been taken further to allow deduplication of arbitrary structures, but that would allow encoders to create circular references, which is something we’d like to avoid.

When an encoder sees multiple maps with the same set of keys, it can make an optional optimisation that is reminiscent of the schema-directed encoding approach but with the schema included in the payload. Instead of storing the key names once for each map, it can use what we call a “repeated keyset optimisation” to refer back to the object shape and encode its values as a super-efficient contiguous byte sequence.

The downside of this compactness is that, unlike JSON, YAML, or edn, SuperPack payloads are not human-readable.

Conclusion

After surveying existing data serialisation formats, we knew we could design one that would be better suited to our particular use case. And our use case is not so rare as to make SuperPack only useful to us; it is very much a general purpose serialisation format. If you want to create very small payloads for arbitrary data of an unknown schema in an environment without access to a lossless data compression algorithm, SuperPack is for you. If you want to see a more direct comparison to similar formats, see the comparison table in the specification.

I’m sold. How do I use it?

As of now, we have an open-source JavaScript implementation of SuperPack.

$ npm install --save superpack