Nanosecond Scale Remote Timing Attacks On PHP Applications: Time To Take Them Seriously?
This article concerns the concept of a Timing Attack (described below) performed remotely over the internet or a local area network. Specifically, it addresses Remote Timing Attacks based on timing differences from a few microseconds to as little as 1 nanosecond (one billionth of a second), a timescale which has been assumed to be impossible to detect over the internet due to the interference of “network jitter”. In the article, I will be summarising some of the recent developments in the area with the goal of demonstrating that a dependence on network jitter as a defence is not sustainable and that PHP applications need to come to terms with these forms of attacks while they are still in their infancy.
I’ve been following the progress of Remote Timing Attacks with a lot of interest over the years, during which time there has been an obvious trend in improving the technique. The most recent reported cases of Remote Timing Attack vulnerabilities, for example, were against the OpenID and OAuth protocols when it was reported in July 2010 that numerous open source implementations of these protocols did not prevent the disclosure of timing information that could enable a Remote Timing Attack. It is important to note that, as with many potential attacks, the protocols themselves contain no vulnerability. This is strictly a potential vulnerability contingent on the method of implementation.
What is a Timing Attack?
A Timing Attack is a form of Side Channel Attack which allows an attacker to discover some secret input to an operation by measuring the operation’s execution time often based on a set of attacker derived inputs. It’s a side channel attack because it utilises non-invasive observations which don’t involve altering the victim application. Other forms of side channel attacks can rely on phenomena such as sound, electromagnetic emissions, power consumption and even temperature. For a Remote Timing Attack, the non invasive observation often targets response time differentials.
A simple example, quite relevant to web applications, is that of someone attempting to compile a list of valid usernames recognised by a web application. This is a pointless exercise against many websites where usernames are often public by design (usually to enable social interaction) but there are web applications being designed where usernames are a form of sensitive data not intended for public disclosure. It has long been noted by programmers that, where usernames are not explicitly public by design, disclosing usernames indirectly is a problem. If nothing else it increases the risk of cross referencing username/password combinations from other more seriously compromised websites. We all use a different password for every single site, don’t we?
At first look, this seems like an impossible task but in reality it doesn’t take much thinking to realise how many web applications likely treat existing and non-existing usernames differently during a login attempt. Differing treatment may lead to clues about the validity of any username in a few ways:
- The website might reveal that the username does/does not exist via an error message or more subtle response elements (e.g. slight markup differences).
- The client might be redirected to different URLs depending on whether the username exists or not.
- An attacker might measure the response time difference between processing a login with an invalid username versus one with a known valid username.
Timing Attacks refer to the third option. Over time, many sites have figured out enough not to give away obvious clues about username existence but they still often neglect to hide the impact of the login checks performed by the server. If the login processing time of an existing user is measurably different than that for a non-existing user (bearing in mind the password is wrong for both), the additional delay imposed before sending a response back to a nefarious individual may be so noticeable that it can be reliably distinguished, over many attempts, from the normal background internet jitter. In this way, anyone could compile/validate a list of existing usernames for a web application using dictionaries of known usernames.
One simple step that is not uncommon is to log attempts to authenticate with a bad password. This audit log step might be skipped for non-existing usernames. Immediately we have one timing difference due to the differing treatment between usernames that do exist and those that don’t. In this case, a simple optimisation introduced the flaw. This alone, or in combination with other operations, is all it might take to giveaway clues about a username’s state based on the leaked execution time information.
The important tool in everything I’ve mentioned so far is the ability to reliably measure small timing differences despite the interference of network jitter and other environmental factors (the natural and oft quoted defences against Remote Timing Attacks). Rudimentary statistical analysis is often sufficient to reduce this background noise for noticeably large differences that require few samples to differentiate, and over the years the resolution at which these timing differences can be measured over even the internet has been improving. The more the methods used improve, the smaller the timing difference that can be detected even with network jitter. For attackers where network jitter is too much to handle, there are other ways to game the application’s timing differences conditionally using SQL Injections if available.
Those Who Do Not Learn From History…
The simple examples above, which are real and known security threats, are an easy way to introduce the topic of Remote Timing Attacks at the macro level. It’s at this level that we’re dealing with databases, interacting with web APIs, writing files to disk, engaging in battle with laboriously constructed piles of poorly performing code, and doing all the other stuff we bother measuring with xhprof or xdebug when building web applications. We’re all familiar with the level of time measurement these require since we actively try to optimise these since they, almost by definition, noticeable. At this level, execution times can differ so substantially that even network jitter can’t hide it from simpler forms of analysis.
But what about the little things? What about operations that are completed in far shorter timeframes ranging from a few microseconds down to the nanosecond (ns) range?
First, let’s be clear that even tiny differences of as little as 1ns are useful to attackers. Of course they are! Any indirect leak of information concerning a suitably sensitive operation is a “leak”. Whether it leaks something worth the trouble of an attacker to attempt detecting it is another question, but it has been proven beyond any doubt that timing information leaks can be serious security vulnerabilities. Whether they are exploitable over the internet is something we’ll discuss shortly but first, let’s look at an example of a tiny timing information leak that could be useful to attackers.
In the C language there is a commonly used function called memcmp() which is used to compare two buffers (strings in PHP terms), returning zero if they are equal or a non-zero value if not. As you would expect, PHP string comparisons using “==” will resolve to similar C function calls - the same is true of every programming language built on C (with varying overheads). What you might not know (unless a C developer - and I barely qualify ;)) is that memcmp() is typically optimised to return a non-zero value at the first sign of any inequality between the strings being compared. It will not process the entire string needlessly. Depending on compiler optimisations, this comparison will often be done on a byte by byte basis or an optimised 32/64bit basis that may still attempt to locate the errant byte (i.e. it still operates at the level of single bytes even if not apparent at face value).
This means that memcmp() leaks timing information: the more correct one string is compared to another (starting sequentially from the first byte), the more byte comparisons/locations memcmp() must perform, and the longer it takes memcmp() to return a result. An attacker can use the timing differential to judge the correctness of their guesses.
An attacker can take advantage of this in a somewhat obvious way. Assuming memcmp() compares byte by byte, we know that a byte only has 256 possible values. So, if we can reliably tell when memcmp() takes slightly longer to complete, we can figure out one by one the correct byte sequence matching an otherwise unknown comparison string (the secret an attacker is attempting to discover). Starting from the first byte, we just try each of the 256 possible byte values, note which took one caused memcmp() to take longer to complete (once memcmp() knows it’s correct it will move on to compare the next byte thus needing more time), and rinse and repeat for the rest of the buffer length. In this way, the leaked timing information allows us to figure out the exact content of the secret string we’re being compared against.
Of course, this has immense value. Instead of having to guess the entire string all at once which would be computationally challenging for any long random string, we now only need to setup an iterative process based on guessing the string byte by byte using very simple steps of 256 possible byte values. In fact, 256 attempts on a single byte turns out to only take 128 guesses on average! This is far from computationally challenging. A 64 byte string would only require, on average, 8,192 guesses (64 x 128).
As hinted at earlier, this isn’t always feasible. memcmp() can be optimised in a few ways depending on the OS, CPU, compiler and compiler options we use. If, for example, memcmp() was optimised not to compare single bytes but a larger block size of say 32 or 64 bits, our 256 possible attempts would swell towards a computationally challenging number of possibilities since our increment size is now huge. This assumes the memcmp() implementation in question does not attempt to locate errant bytes (which leaks timing information regardless). Since my C knowledge is limited (I merely dabble for the sake of PHP extensions), here’s the techno-babble comparing different operating systems and versions.
Two simple examples where guessing an unknown string would have value to an attacker? Granted, you probably already predicted the first one! Which may be different from everyone else’s if you own an Xbox 360.
The first is obvious - let’s assume the existence of a particularly naïve web application where, during the login process, the inputted password is directly compared against a database stored plaintext password using “==” (this may also hold true in comparisons performed by the database itself). With the omission of one way hashing, this direct comparison means that the application is leaking timing information about the nature of the actual password string, thus allowing an attacker to employ a Remote Timing Attack to figure out the correct password for any account they wish. Using one-way hashing should defeat this because it’s very hard to work backwards from timing leaks when the attacker derived input keeps changing (hashes to a different digest each time a byte changes in the original plaintext input). Just be wary where hashing is employed but the hash is actually sent directly by a client, e.g. HTTP Digest, since this is equivalent to comparing plaintext passwords except that interception of the plaintext password en route is prevented. You might suspect that my example for hash forcing, HTTP Digest, has some replay protection using a nonce but in practice nonces may be omitted, poorly implemented and/or easily predicted thus robbing naïve implementations of their replay protection.
A second example would be any secure protocol relying on a digital signature such as a HMAC. If the server side compares the HMAC attached to a client message to its own generated HMAC (based on a shared secret) from the message data using “==”, we have a similar result. The comparison leaks timing information about how correct the attacker’s HMAC guess was for a given request. This was the specific issue reported against open source implementations of OpenID and OAuth. Of course, there are mitigating factors - a well designed protocol would also employ a single-use nonce value to be included in each message to prevent attacker replays (and so preventing Timing Attacks). This nonce is never reused within a reasonable time period, so the signature of any otherwise identical message would always be different due to the variability introduced by the added nonce value.
The problem, as identified with OpenID, is that nonce protection requires that the nonce be checked first as a cause for refusing a request before the server executes the HMAC comparison (otherwise it performs the HMAC comparison and leaks timing information anyway - good nonce or bad!). The other problem is whether implementations use a reasonably sufficient nonce generation/verification process.
Implementors can be easily confused into thinking a nonce is some form of timestamp which is a mistake, or that nonces are unnecessary complexity which is even worse. Nonces based solely on a timestamp are subject to time - and time is predictable if you have a good clock. Nonces interpreted as a Session ID (to eliminate storage requirements) are fixed over a specific time frame which, again, makes them predictable. Not implementing nonces at all (or really poorly) means that replay attacks suffer no preventative measure, giving free reign to attackers in intercepting and replaying requests from a client.
These examples are all good, but let’s look at what kind of timing information is being disclosed to the world by them. Afterall, the examples given are likely to require an attack over the internet (assuming direct LAN or even direct machine access is impractical).
…Are Doomed To Repeat It
What measure of leaked timing information does memcmp() actually give away? It’s hard to stick a concrete number on it so we’ll assume it needs 1 nanosecond (ns) on a reasonable CPU to compare any two bytes during its execution. I borrowed this from the Lawson/Taylor talk at Blackhat USA so it’s as good as any other arbitrary number. If you want to get really conservative, you can bump it up a bit (i.e. making the timing difference more detectable). Concrete numbers are dangerous of course since they will obviously vary depending on hardware, so we really should assume that somewhere it will take longer than 1ns.
Timing Attacks have been demonstrated for years with local access to servers going back as far as 1996 when Paul C. Kocher wrote about Timing Attacks against Diffie-Hellman and RSA. When you can directly observe the actual machine doing the work, or even over a minimal network setup, network jitter doesn’t have much of a chance to interfere. Once you throw in enough network jitter, however, nanosecond/microsecond level timing differences become extremely difficult to measure. So let’s shift over to the next question of interest. What kind of timing differences can we reliably resolve over the internet?
This is where using “Timing Attack” and “Internet” in the same sentence seems to border on the insane. We are so used as web developers to having our time measurements bottom out at the microsecond level (not all of us work at Facebook) that we live with the comfortable knowledge that nothing below this is likely to be relevant. So what if we are leaking 1ns clues on string comparisons to the entire world? Network jitter should swallow that up with gusto! Right?
If Timing Attacks at the nanosecond resolution were possible on web applications over the internet, we’d likely be in serious trouble. It’s not like memcmp() is the only possible item subject to tiny timing differences depending on inputs. There can be little doubt about this - it is not an area generally worried about in web application security so there are a lot of applications, libraries and even C extensions/libraries which routinely leak timing information about sensitive operations all the time. Some of them, like OpenSSL, have already been previously patched for Timing Attack vulnerabilities in the past but, despite this awareness, may refuse to patch similar vulnerabilities more suited to remote exploitation. The exploitability of these attacks over the internet is entirely dependent on eliminating the impact of network jitter, and it is this feat that will determine the viability of exploiting unpatched vulnerabilities.
The most recent research was carried out by Nate Lawson and Taylor Nelson of Root Labs and presented at Blackhat USA in July of this year. In the talk, Lawson/Nelson explain that there are a number of factors to consider in figuring out just what kind of detectable resolution for timing information is possible over networks. Broadly speaking there are three essential factors - a good vantage point and configuration for your client, the capability to gather sufficient samples, and an appropriate method of statistical analysis. While the research was, in my mind, still in its data collection stage, the results were extremely interesting. If you can spare an hour some time, you can watch the entire session over on Youtube.
Briefly, to explain the three factors I noted earlier. A good vantage point is helpful in establishing a high quality connection to the target system which minimises unfiltered network jitter. This can depend on a few other factors like ensuring the client system is configured for the task, using high quality connections/infrastructure, and even locating the client closer to the target in a network (where it makes a difference - less network hops doesn’t necessarily lessen network jitter). These, among other factors, can assist in minimising network jitter and improving the resolution at which which timing differences can be detected. Gathering sufficient samples is obvious - the greater the number of samples, the more accurately timing differences can be distinguished from the largely additive background noise and the more resolution you can theoretically obtain. An appropriate method of analysis relies on the field of statistics. Unsurprisingly, statistical analysis is capable of picking out outrageously tiny signals from all the noise in sampled data once you use an appropriate approach.
Considering this, here’s the table of results noted from Lawson/Taylor’s Blackhat USA talk (based on a sample size of 9,000):
Localhost: < 25ns
Crossover Cable: < 25ns
Amazon EC2 to EC2: 15 - 400ns
Internet DSL to Amazon EC2: < 20µs
Compare this against the last set of numbers I knew about from Crosby et al.’s Opportunities And Limits Of Remote Timing Attacks which also indicated 20us for the internet and 100ns or less across a LAN. The results at Blackhat USA have improved on this in some areas and pointed out additional results. Further, it reconfirms that the sample sizes needed for these kinds of resolutions are very low. 9,000 samples may seem nuts at first glance but it’s easily accomplished.
The results show us that detecting timing differences over the internet is limited to a resolution of 20 microseconds (µs) over what we consider typical internet connections. This is based on a sample size of 9,000 requests so we can (and should) assume that a higher sample size would offer an even better resolution. A determined attacker could feasibly resort to millions of measurements depending on the target, perhaps offloaded to a botnet or to innocent clients using, albeit far less reliable, Cross-Site Timing attacks. While we are primarily concerned with the internet here, the results also backup Crosby et al.’s 100ns measurements over a LAN, improving the resolution to less than 40ns. Lawson/Nelson’s talk also reported a LAN resolution of <15 nanoseconds (almost a three times improvement over 40ns) using a sample size of 49,000 - I can’t emphasise enough the importance of a sufficient sample size. People tend to quote numbers similar to the Lawson/Nelson table as being constant without understanding that the results are entirely sample size dependent.
Considering our memcmp() execution time for comparing two bytes is assumed to be around 1 nanosecond, it’s obvious that under the internet resolution of 20µs such timing differences are undetectable. This conclusion, while apparently reasonable, is a bit presumptious. What happens with a server that has a slow CPU/limited RAM and is running a slightly more expensive operation than a simple memcmp() byte compare, perhaps on one of those new Intel Atom based SeaMicro SM10000s? Yes, they really do use Intel Atom CPUs. What if we increase the sample size substantially over 10,000? What if we locate a better method of collecting/analysing the resulting data? These questions all suggest that that dismissing Remote Timing Attacks based on one single example is questionable. Of course, 1ns vs 20µs presents a huge gulf to demonstrating the exploitability of such a scenario over an internet connection - but this does not preclude cases where better appoaches to detecting timing differences and/or the existence of more measurable operations would give rise to a true exploit.
Let’s back off from the internet though. Network jitter over your basic internet is noisy, too noisy for a 1ns resolution without going through a lot of hoops and waiting for more research or Internet 2.0 to emerge with magical anti-jitter properties. In any case, who cares about a consumer connection over dodgy cables potentially planted in the ground decades ago (unless you’re Swedish)? This is the 21st Century, and computing has a new 21st Century solution: The Cloud.
If we want to get really clever, let’s just target someone using Amazon’s EC2 and leverage off Amazon’s own superior network. The Lawson/Nelson results contain another interesting data point - the resolution at which timing differences could be discerned between EC2 instances was between 15ns and 400ns (again based on a 9,000 sample size). We can just filter out samples that are unfortunate enough to hit the 400ns wall (due to some process within the EC2 system that has not been identified…yet) and revel in 15ns resolutions between our EC2 instance and a victim’s. The key to this is using a really great vantage point - another EC2 instance located close to, if not on the same hardware as, the victim. Achieving this is not as far fetched as you might suspect given the security cost of relying on a shared network.
Throw in possible situation improvements (for an attacker) due to CPU/RAM constraints, increased sampling and analysis, and we now have something a heck of a lot more worrying. The real possibility that our network jitter defence is dead in the water - the gulf between a 1ns memcmp() comparison and a 15ns detection resolution no longer looks insurmountable. It looks downright dangerous if you assume detection can only improve with more samples and other expensive operations that are worth targeting regularly exceed 1ns.
I’m sure we all love cloud computing regardless but there can be no doubt that pointing fingers at the internet’s 20µs achievable resolution is near pointless. Especially in the open source world, we don’t have any control over whether or not our applications and libraries will be hosted in the cloud and therefore we must assume they will be (whether it be EC2, Azure, Mosso and such). Following that logic, this means we must now ensure that any sensitive operations subject to Remote Timing Attacks can stand up to the 15ns detection resolution at 9,000 samples (or whatever it would be with a much better sample size) without leaking information about sensitive operations.
Removing Timing Differences: What Are Your Options?
Back in July, Lawson/Nelson reported that many open source implementations of OpenID leaked timing information. The resulting mailing list thread contains some interesting ideas to resolve this. For the moment, we’ll ignore network jitter - a 15ns resolution (less with better sampling) between Amazon EC2 instances just puts paid to that as a plausible defence (not that I doubt it will be argued otherwise forever).
Proposed fixes in the thread included adding random noise (i.e. a random delay before returning responses). Lawson/Nelson’s Blackhat USA talk addressed this suggestion directly, as have previous studies, showing that random noise achieved nothing other than to require that an attacker take an increased sample size to eliminate it. The random noise route increases the difficulty of an attack but does not obviate the need for a fuller solution.
Another suggestion was that this was not a problem since OpenID also requires non-repeating nonces which means that any message’s HMAC digital signature (the secret a timing attack might target) should change on every request even though its data was otherwise identical, thus preventing the replay attacks needed to take advantage of guessing the correct HMAC. This was also picked apart, to a degree, by noting that timing information was leaked anyway unless implementations checked the nonce before the digital signature and terminated request verification if an invalid nonce was detected. It turned out that the order of these operations, and this specific treatment, is not specified for OpenID 2.0 and so implementations may or may not actually use the more secure order of checks. Assuming they use secure nonces in the first place given how easily developers can implement them poorly. Another suggested fix was implementing a random start point within each string for performing a string comparison. While it would probably work, it would add complexity and performance issues that might prove needless.
The simplest fix is, for string comparisons at least, simply to ensure that for any two strings being compared, the comparison takes the exact same number of operations (and thus a fixed execution time) no matter what. This eliminates any C compare function’s optimisation that leaks timing information completely and it’s a relatively simple function even in PHP. Of course the downside is that this must be implemented in native PHP for now, not C, and therefore it does involve a small performance penalty. As usual, you either pick better performance or better security. Since the performance difference is minimal, I’d prefer the security.
The code above is fairly straightforward. It ensures that every single byte of each string parameter is compared in the correct order, with the XOR value of both added to a tracking variable. If, at the end of the process, the tracking variable equals zero then the strings were identical. This method does obviously leak another piece of information - the function returns quickly if the two strings being compared are not the same length. It’s a compromise of sorts on the basis that this does not tell an attacker anything other than the expected string length (something they may even know already in the case of a standard protocol where it would be specified).
Outside of string comparison, the strategy is obviously to minimise, if not eliminate, observable timing differences where such differences may disclose sensitive information. Don’t expect total response time to be the only target. In the case of content chunking (where it applies), inter-chunk timing can help an attacker focus more clearly on a specific subset of the processing being applied for a request. Obviously, eliminating timing differences on everything is ridiculous so focus your efforts. The best targets are those where sensitive information is being used when operating on user sourced data.
Time To Take This Seriously?
The prevailing belief is simple: if a timing difference is sufficiently small, it will be impossible to distinguish it from network jitter. As I’ve examined above, the evidence simply is not supporting this line of reasoning any longer. It’s a fallacy; it always was a fallacy; and continuing to blindly believe in it is misleading. If we acknowledge that network jitter can be sufficiently filtered out to allow the measurement of very small timing differences then Remote Timing Attacks are no longer impossible.
Over the past year, Remote Timing Attacks have been reported as vulnerabilities and have proven compelling enough to make vendors to issue fixes. Google’s Keyczar library was fixed during 2009 for a timing attack vulnerability, as was Java SE 6 in Update 17 though Java remains far more susceptible to timing attacks than PHP due to the poor performance of its comparison methods. While not web applications in and of themselves, these are used within actual web applications against which any attacks would be targeted. Ruby On Rails, something closer to our environment being a web application framework, was also patched for a potential Timing Attack vulnerability in September 2009. In March 2010, CouchDB had it’s own run in with memcmp() like string comparisons as a timing attack vulnerability. More recently, Microsoft is dealing with the same issue after the reporters of the ASP.NET padding oracle vulnerability stated that a Timing Attack would work in place of the easier error message leaks originally removed by a Microsoft workaround to the issue. These are just the tip of a very large iceburg of unfixed timing information leaks.
So yes, perhaps it is time to take this seriously. If there is a problem with Remote Timing Attacks, it’s one of perception. While the data piles up indicating they are already exploitable, real world exploit and security vulnerability examples are thin on the ground. On the other hand, Timing Attacks without the remote element are far more established so the only actual defence employed in most other cases is hiding behind network jitter. It would be great for a researcher to literally prove a specific exploit as part of a study instead of simply referring to, albeit valuable, the separate parts that make the R in RTA possible. Some people need the tangible proof that a fully proven exploit provides, and others will continue to play down the attack form until this turns up. Such research would be much needed bombshell to provoke more vendors into fixing timing information leaks. Unfortunately, Timing Attacks aren’t as easy to demonstrate as alert(“XSS”) ;). In the meantime, let’s play it safe and keep Remote Timing Attacks in mind. Where possible, implement the necessary preventative yourself or report it to maintainers if you see this issue in a library or application you use. Forewarned is forearmed.
As of version 1.11 which has been released as a beta, Zend Framework has been patched to remove several timing leaks primarily as a countermeasure against any future exploitation attempts of this nature. This is not considered a critical security vulnerability in the framework given the paucity of evidence that exploits are actively being pursued. Implementing it now, before exploits are widely in use, is simply an act of prudence that offers a measure of forward looking protection for Zend Framework users. I would encourage other frameworks and libraries to take similar steps. Let’s nip this in the bud. And, add decent nonce implementations where necessary if we haven’t already, please ;).