Welcome to Black Hat Conference Season…
Last week, news started to spread from the Black Hat conference about a new oracle attack (called the BREACH attack) against HTTPS which may allow an attacker to guess desireable values contained within deflate compressed responses in a very short time, typically under a minute according to the presenters.
We call this an “oracle attack” because we’re not attempting the crack the HTTPS encryption and read the content directly but are instead monitoring the compressed size of the content which will fluctuate depending on the content being compressed thus leaking information about what had been encrypted. In a sense it’s also a side-channel attack on the compression algorithm – another example of a side-channel attack is using the time it takes to compare strings on a server to enumerate valid usernames and emails known by applications, all without accessing the database directly – some frameworks will feature a fixed time string comparison function for this very reason.
BREACH is shorthand for Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext. Someone went to a lot of trouble inventing a name to fit that acronym. As the name suggests, though it’s not always made clear in current articles online, the attack isn’t just good for HTTPS – it can also work against HTTP in situations where attackers cannot get hold of the content but can access a good record of the metadata for responses. If you’re an NSA employee this may be just a few keystrokes away.
Edit: Anthony Ferrara (ircmaxell) has post his own thoughts on the BREACH attack: http://blog.ircmaxell.com/2013/08/dont-worry-about-breach.html
Note: If you do some more research about BREACH attacks, you’ll notice the prominance of CSRF token references. CSRF tokens tend to be weakly controlled in applications and frameworks, i.e. single token per session without limited use scopes or narrow expiration times. A compromised CSRF token could conceivably be valid for a user across an entire site for every single form so long as their session remains open. This said, targeted data is not exclusively tokens – email addresses, real names, credit card information, order numbers, delivery/business addresses and pretty much ANYTHING including personally identifiable information you want hidden by HTTPS encryption could also be targeted.
The attack itself isn’t that hard to understand. The data compression algorithm we call DEFLATE (the basis of gzip) uses the LZ77 algorithm which takes advantage of repeated strings to more efficiently compress output. The more repeating characters there are, the smaller the compressed output becomes. This holds true regardless of whether the compressed content is HTTPS encrypted or not.
If an attacker can inject a string into a HTTPS response intended to match another unknown string (the target secret), they can iteratively guess the secret value by monitoring the compressed size of the responses for different guesses. The more correct a guess is (i.e. matching sequential characters strings at either the beginning or end of the guess), the more efficiently LZ77 can compress the content, and the smaller the response size becomes. In hindsight, this appears obvious but we’ve never had a concrete prove of concept before now targeting content bodies in encrypted responses.
I’ll save you the trouble of a long read if you’re short on time. The only complete surefire defence against BREACH attacks is to disable HTTP compression, i.e. mod_deflate for Apache and the gzip module for nginx. This may have significant performance implications but there is no other known comprehensive defence. I’ll mention some other possible solutions later in this post but all of them have their weaknesses and limitations. BREACH is basically a fundamental flaw in HTTP – any permanent solution will need to need to come from the HTTP layer.
The attacker needs two capabilities to pull off a BREACH Attack:
1. The ability to read responses received by the user’s browser.
2. The ability to cause the user to send requests from their browser.
3. Some part of the request must be reflected in the response.
The first is an eavesdropping ability that might be gained using ARP poisoning or something more concrete like cable splitting. Some attackers may have specialised rooms at ISPs across the planet funded by three letter agencies. Note that this does not require cracking any HTTPS encryption – we just need a way to collect data on content sizes from responses.
The third is a question of application design. A multipage form, for example, might carry across extra hidden fields containing attacker injected strings (bad validation). A seach form might redisplay the search terms on the page. A messages tab would redisplay submitted messages. There are lots of valid direct and indirect (via database) reflections of user data in responses of which some will be perfectly normal and others the result of security weaknesses.
Other Factors To Consider
It’s important to note that HTTPS encryption has little to no impact on the viability of this attack. BREACH works on all SSL/TLS versions and cipher suites. Some ciphers actually make it easier and others make it harder. None pose insurmountable problems, however, since attackers can adjust the makeup of the injected guesses, the number of measurements and the scoring of the results to filter out the impact of the most difficult ciphers. The attack can only become more effective over time.
The LZ77 algorithm does not operate alone – it has a partner called Huffman coding within DEFLATE. This coding pollutes the compression measurements since it sometimes prevents repeated strings from providing compression efficiencies under LZ77. This is actually simple to solve using a bit of arcane spellcasting (literally just a bit of string padding) and using twice the number of measurements as LZ77 would require were it an attacker’s sole concern.
The attacker will also need to “bootstrap” the attack by having starting comparable strings to use. For example, the string “csrftoken=” can be known in advance if used by the target site’s output (CSRF tokens can be used in GET URLs not just forms). This quickly becomes more difficult for data delimited by tightly controlled characters, for example a CSRF token in a form will be delimited with quotes as part of the form markup. Unless you are injecting user data into markup without escaping it, the attacker won’t be able to inject a matching bootstrap string (the quote would be escaped) which limits their ability to perform attacks.
They could try something else – perhaps you pad tokens with something predictable or guessable, perhaps you inject some user data into attributes so they end up quoted anyway. For example, a token prefixed with a time element would be predictable. Attackers could also blind guess the first few characters but the math isn’t favourable under those conditions. There may be other scenarios, e.g. attribute reflection, where the quotes can still be used though we hope reflecting user data in an attribute is rare given it’s an obvious Cross-Site Scripting risk.
Application Weaknesses and Defenses
From the above, we can ascertain that an attacker needs the right application behaviour in which to develop this attack into a viable threat:
1. Responses must be served from a server which has HTTP compression enabled.
2. Some part of the user request must be reflected within a response so guesses are compressed with the targeted information.
3. The response body must contain some information desired by the attacker.
4. While not strictly necessary, it would be nice for the response to have as little noise (changing content) as possible.
We can also deduce likely defenses against BREACH attacks from the same list:
1. We can disable HTTP compression altogether.
Since the BREACH Attack relies on the compression to execute a side-channel attack, it’s sort of obvious that disabling mod_deflate for Apache or the gzip module for niginx stops this attack dead in its tracks. This is basically the standard recommended defence at this time pending something provably better. You may not like the performance implications, and the attack may not yet be widespread, but don’t doubt that there are blackhats and criminals out there working on engineering easy to deploy BREACH attacks as you read this. The theory is so simple that I wouldn’t expect it to take more than a few hours to prototype – the delays may come from automating it effectively and then figuring out what to charge the criminal markets for it. The attack takes just a few thousand requests which can be completed in under a minute – time and requests increase as the length of the value wanted does but only in a linear manner.
2. We can prevent the direct and indirect reflection of user input in responses.
Direct reflection is simply dumping data from the request parameters straight into the response. Indirect reflection just means it takes a scenic route, e.g. via the database or a third party API. It’s impossible to eradicate all user data reflection – the very notion is silly. However, you can be wary of what you are reflecting back and whether it is strictly necessary. Reducing the attack’s surface area is better than doing nothing at all. For example, monitor how you transfer hidden values across multipage forms (discard invalid parameters). The attacker will need to meddle with request URIs and/or form encoded data so your validation may catch some of what they’re attempting. This will be very hit and miss unless you simply stop users from submitting and storing any data!
3. We could randomly adjust the length of responses so compression output size also becomes randomised.
Length hiding is a common enough defence you’ll see quoted everywhere in the wild for lots of security related topics. Sadly, people forget that if you take sufficient measurements of anything that has both a fixed and random element to its length, there’s this annoying thing called “standard error” which stubbornly insists on being inversely propertional to the number of measurements. Yet more statistical arcane spellcasting by hellspawn! The more you measure, the more length hiding is averaged out until it’s rendered pointless. It will force the attacker to require more requests (which means more time and coffee trips) in their BREACH attack but that’s all will accomplish.
4. We could prevent the compression of secrets and other desireable data in responses.
You can mutate only certain compressed secret data in such a way that BREACH attacks can’t iteratively uncover it (i.e. masking). For example, you can use a per request padding concatenated to a XOR of the padding with the actual data. This shifts the data per request making a BREACH attack impossible. Obviously, this will NOT work for data which is meant to be user readable. It’s also probably simpler just to generate a unique value per request and be done with it for tokens and other such information – it’s probably going to break stuff for users using multiple browser tabs either way.
5. We could rate limit requests to the server
The BREACH attack requires thousands of requests so rate limiting may be helpful. The attack unfortunately doesn’t really need a massive number of requests – a couple of thousand in the space of a minute isn’t as big as you might think – so this may or may not be realistic. High traffic sites would barely notice such a surge and it may even be expected as a result of normal user interaction.
As it stands, disabling HTTP compression is the simplest and most effective solution. For those applications/frameworks emitting CSRF tokens for GET URLs, defence 4 is a bare minimum at this point – defense in depth dictates doing something unless you have a crystal ball to predict all possible user written code. CSRF tokens in forms should have quote delimitation making BREACH attacks, we hope, more difficult but they should also be checked to ensure there is no common or predictable token padding that would help bootstrap these attacks. The community concensus may well move against single-generated CSRF session tokens altogether despite the user impact. This is likely the most immediate concern for ALL frameworks with form capabilities.
Want to read more? There’s a BREACH attack website online by the paper authors with a link to the original whitepaper.
- Step into the BREACH: New attack developed to read encrypted web data (go.theregister.com)
This is a branch off from a separate discussion on the PHP-FIG mailing list about other ways the Framework Interoperability Group can encourage and foster wider interoperability among its member projects (and by extension, the whole PHP community). I’ll start by noting two interesting developments in recent months and one long standing best practice.
1. Launch of the SensioLabs Security Advisory Checker
The SensioLabs Security Advisor Checker is described on its website as follows.
You manage your PHP project dependencies with Composer, right? But are you sure that your project does not depend on a package with known security issues? The SensioLabs security advisories checker is a simple tool, available as a web service or as an online application, that uses the information from your composer.lock file to check for known security vulnerabilities. This checker is a frontend for the security advisories database.
The service operates by having people submit vulnerability data, as YAML files, to a centralised Github repository through pull requests. The upside is that the vulnerability data can be peer reviewed and centrally dispersed either online or via a service API. The downside is that you need to find vulnerability disclosures and people to submit them. The service currently covers Symfony, Zend Framework, Doctrine, Twig and FriendsOfSymfony bundles. It’s a tiny sample of packages available through Composer. I’m also not entirely sure if it’s sufficiently fine grained to report vulnerabilities on a project’s sub-packages where you have no direct dependency on the aggregate package (e.g. using zendframework/zend-db instead of zendframework/zendframework). That said, this is a working model of a service for checking your dependencies.
That said, the service exhibits an ambitious idea – projects sharing their vulnerability disclosures or advisories in a way that allows for automatically checking if any of your projects need to have their dependencies updated for security reasons.
2. OWASP‘s Top 10 security risks for 2013 includes “A9 – Using Components with Known Vulnerabilities”
This is a new entry onto OWASP’s Top 10 (which is currently at release candidate status for 2013). In summary, it recognises that applications are becoming ever more dependent on code not developed internally. We’ve had web application frameworks for years. Composer and Github have unleashed a storm of accessible libraries, bundles, modules, and other units of reuse that have revealed Not Invented Here (NIH Syndrome) as a psychological problem in ways not previously possible.
As reliance on externally controlled dependencies increases, so too does the risk of your applications using insecure dependencies. This is a risk that requires a lot of work to mitigate. For each dependency, you need to do a security review (no, I’m not kidding), check for security disclosures (whether voluntary or involuntary) and ensure that you end up rolling out to production with safe versions.
Quoting from the OWASP advice on preventing the use of components with known vulnerabilities…
One option is not to use components that you didn’t write. But realistically, the best way to deal with this risk is to ensure that you keep your components up-to-date. Many open source projects (and other component sources) do not create vulnerability patches for old versions. Instead, most simply fix the problem in the next version. Software projects should have a process in place to:
1. Identify the components and their versions you are using, including all dependencies. (e.g., the versions plugin)
2. Monitor the security of these components in public databases, project mailing lists, and security mailing lists, and keep them up-to-date.
3. Establish security policies governing component use, such as requiring certain software development practices, passing security tests, and acceptable licenses.
3. Disclosing security vulnerabilities in a timely and responsible manner is a best practice
As programmers, we have a responsibility to users to disclose security vulnerabilities and fix them in a timely manner to ensure that those users are protected from harm. It’s almost impossible not to end up in such a situation at some point in your career. In fact, it may even be impossible for it not to happen multiple times in a single year!
The sad truth, however, is that disclosing security vulnerabilities can be terribly hit and miss. I’ve seen people ignore vulnerabilities or fix them but fail to disclose the fact to their users. Opinions over the severity of a vulnerability can vary dramatically within even a small group of programmers. Nobody likes to air their dirty laundry in public but not doing so can mean someone including a dependency with a known vulnerability without any means of becoming aware of that vulnerability.
It is always a good thing to come clean. Fixing a vulnerability, disclosing it, and having a good security policy in place prevents the reputational damage you might suspect would occur. It’s usually the secretive rollout of fixes that gets you in trouble when someone is attacked or the reporter discloses the vulnerability through other means (usually making note of your refusal to come clean).
The method of disclosure is usually in release notes, commit messages, blog posts or emails. This article suggests using formats that are more fundamentally consumable and standardised.
Centralised Tracking Of Decentralised Vulnerability Data?
Being aware of these three, we can see the immediate value in something like SensioLabs security advisory checking service. You have dependencies which very likely have had or will have vulnerabilities, and you probably would love to know about those before releasing a new project build to production servers. The problem is that this involves work in importing vulnerability data into the checking service and, failing that at present, a trawl of the internet for vulnerability disclosure blog posts, commit messages and emails. What would happen if, as a means of improving interoperability and common security, more vendors published their disclosures at fixed URIs in just one or two easily consumable formats (e.g. YAML or RSS/Atom)?
For example, instead of relying on someone submitting a pull request to SensioLabs each time Library X discloses a vulnerability, one could simply store a URI to Library X’s disclosure feed and/or a YAML formatted summary stored in its git repository. The SensioLabs service, or something like it, could now pull in vulnerabilities automatically assuming Library X uses a predetermined consumable format. This sounds, at least to me, as a more sustainable system.
If a sufficient number of packages on Composer followed this practice, we’d have something quite brilliant and possibly easier to promote in the community. People are now very familiar with maintaining a composer.json file. Adding one more file, in lieu of an alternative RSS/Atom feed, is not that big of a stretch if enough projects request it of their dependencies. The rest would be down to the boring work of agreeing formats, procedures and other technical aspects with a view towards, *if* called for, a PSR on the topic.
Let me know what you all think in the comments or catch me on the PHP-FIG mailing list.