Automatic Output Escaping In PHP And The Real Future Of Preventing Cross-Site Scripting (XSS)
A while back, the Zend Framework 2.0 team decided that automatic escaping for Zend\View (a template engine where all templates are written in PHP itself) was too unsuitable and potentially confusing to be included. As a result, Zend\View templates will continue relying on manual escaping against Cross-Site Scripting (XSS) vulnerabilities using a new Zend\Escaper component.
Nevertheless, the decision was not taken lightly. Automatic escaping has a certain appeal given its goal of removing the need to type escape() all over your templates. Funny thing, though, is that this is basically its one and only advantage. The second claimed goal is to remove a factor of human error (i.e. forgetting to type escape() somewhere), however, this hasn’t posed an issue for me in the past where simple analysis of templates can quickly locate such omissions. And no, using automatic escaping does not remove the need to analyse templates for security issues – that’s still needed regardless. At some point, we seem to have lost the plot and overinflated these benefits in our minds. So, rather than muddy the waters with confusing object-proxies and expending too many CPU cycles for too little of a benefit, ZF 2.0 is back on manual escaping.
In reality, automatic escaping doesn’t resolve all (or even most) of your security problems with XSS. Why? Because all escaping, regardless of how automatic it claims to be, still needs one fundamental factor to be successful: manual oversight by a knowledgeable programmer.
Whether you choose manual or automatic escaping, neither will prevent poorly educated programmers from shooting themselves in the foot with XSS vulnerabilities because those kinds of programmers just don’t understand XSS. Worse, it still won’t prevent even really good programmers from making errors of omission or misjudged context – nobody is perfect and any process needing human input inevitably experiences human errors.
In the game of mitigating against the risks of XSS, how you escape is not as important as knowing why you are escaping. That second point, understanding why you escape data on output, is unfortunately commonly misunderstood. Yet, without that basic understanding – your choice of how to escape is quite possibly incorrect and, worse, it allows insecure escaping practices to thrive as that misunderstanding becomes embedded in what we pass on to other PHP programmers. We’re self-perpetuating our own ignorance – Stackoverflow and articles still commonly present escaping notions that are plain wrong.
So, let’s travel down the automatic escaping rabbit-hole to understand why automatic escaping hasn’t progressed much further than serving as a method for reducing template verbiage. At the end, I’ll explain what is being done on the browser-side of XSS prevention to offset the problems ALL programming languages are having with getting escaping done perfectly.
What Is Automatic Escaping?
Defining automatic escaping would probably help to explain why manual oversight of escaping is unavoidable. There are broadly two definitions in use these days, or perhaps it’s more accurate to call them styles. The first automates escaping by applying a fixed escaping strategy to all data being output in a template. This is the “scope limited” style where the auto escaper is incapable of automatically switching escaping strategies depending on the context into which the data is being output. Another phrase used to describe this form of automatic escaping is Poka-Yoke (a Japanese phrase for “mistake-proofing”). A good example of this style is the Twig library used in Symfony 2:
If you’re using Twig templates, then output escaping is on by default. This means that you’re protected out-of-the-box from the unintentional consequences of user-submitted code. By default, the output escaping assumes that content is being escaped for HTML output.
Context Always Determines Your Escaping Strategy
When I said my brain would explode, did I mention my symptoms are contagious?
Scope-Limited Automatic Escaping
Back to our two automatic escaping styles… In a “scope limited” automatic escaper, such as Twig’s, having a fixed escaping strategy means only one thing. You’ll go to Hell if you forget to successfully track contexts and manually intervene to prevent the automatic one-trick escaper from introducing a security vulnerability. Being automatic does not make a scope limited escaper infallable – it just makes it dumber than a programmer. Someone still has to read the template, ensure the automatic escaping is appropriate, and manually insert escape calls or disable the auto escaper altogether where it isn’t appropriate.
Based on the above, scope limited automatic escaping targets just one of the two manual tasks associated with secure escaping against XSS: typing escape calls into templates. In essence, its only purpose is to eliminate template verbiage to the degree that your templates require the single escaping strategy it exposes by default – usually HTML escaping via htmlspecialchars(). It doesn’t concern itself with the second task of determining context to ensure the escaping used is safe.
That’s your job.
Context-Aware Automatic Escaping
The second style of automatic escaping is “context aware” automatic escaping. In context aware escaping, the escaping mechanism can analyse your templates to detect the contexts that apply to each output. Based on the contexts detected, this mechanism can then select an appropriate escaping strategy to apply. In theory, a reliable implementation of context aware escaping would eliminate all manual programmer involvement in escaping against XSS which would, very obviously, render all other forms of manual or automatic escaping obsolete. An example of such a solution with these claimed benefits is the Latte template engine used by Nette Framework:
If the coder omits the escaping a security hole is made. That’s why template engines implement automated escaping. The problem is that the web page has different contexts and each has different rules for escaping printed data. A security hole then shows up if the wrong escaping functions are used. But Latte is sophisticated. It features [the] unique technology of Context-Aware Escaping which recognizes the context in which the macro is placed and chooses the right escaping mode. What does that mean? Latte doesn’t need any manual work. All is done automatically, consistently and correctly. You don’t have to worry about security holes.
Once again, automatic escapers are dumb. They don’t replace your brain.
Where Is Automatic Escaping Going?
Given the state of automatic escaping in PHP, I’m not too keen about their direction. As a tool for convenience, they have benefits to reduce typing and assist forgetful programmers. As a tool to be blindly relied upon – are you nuts? And that’s the real problem that automatic escaping has: the potential for blind reliance.
The primary symptom of this sits squarely with the documentation for automatic escapers. Many are couched in language which may downplay their disadvantages or neglect to mention them (or any other conceivable problem) at all. The average reader could be forgiven for arriving at a general conclusion that automatic escaping replaces the need for manual oversight – a bad conclusion that leads to a false sense of security and a blind spot to potential XSS vulnerabilities. This is not to say that libraries/frameworks are doing this deliberately – often it’s simply a case of assuming the reader knows what secure escaping is and knows enough not to put too much faith into automation. There is a minority of cases, however, where the documentation is completely silent as to the downside thus rating all their flaws as the next worst thing to a reportable security vulnerability – stuff that will never be fixed and which users are blissfully unaware of and so will use, unknowingly introducing vulnerabilities into their applications because they trusted the wrong solution.
This is PHP – presuming everyone knows about good secure escaping, or will read your source code to find security weaknesses, is the wrong assumption to make when your programming language has a long established history of sucking at security, in particular sucking at preventing XSS through escaping.
All frameworks are in the same boat regardless of their escaping practices – if manual oversight of escaping is sacrificed, insecure applications will inevitably be the result. Heck, even with amazingly good oversight it will still happen – just more rarely. The Homo Sapien programmer is always the weakest link. We could wait for evolution to make us all insanely obsessive about examining every potential piece of data exhaustively for security issues but let’s face it – it ain’t going to happen. Automatic escaping is not our saviour. It can help, it could help more in time, but it will never become the final solution to the manually intensive and mind-boggling task called escaping.
Luckily, we have a helping hand on the way…
Content Security Policy (CSP) To The Rescue!
There are two obvious problems with preventing XSS that all programmers (from all programmer languages) have battled with since HTML was invented:
- Escaping is too complicated; and
- HTML standards continue supporting a status-quo where XSS’s greatest ally is HTML itself.
Let’s face it – the first problem is never going to vanish. We’re stuck with it forever.
HTML is also never going to vanish. We’re stuck with it forever too.
Unless…we cheat and forcefully alter how the HTML specifications apply to our applications.
By alienating the practice of automatically trusting inline resources in HTML and external resources as a default, we’re basically flipping a finger at HTML in the best possible sense by neutering the insecure practices it allows. The new approach asks that you whitelist the inline and external resources that should be trusted. It removes the automatic trust-everything problem with HTML that allows XSS to thrive.
All PHP programmers should consider adopting the Content Security Policy. While the specification is being drafted by the W3C, and while it will take time to gain majority coverage as newer browser versions are adopted, it can be implemented right now with an eye towards the future. This will very obviously become a best-practice security defense for web applications, so get used to it being preached to you.
Escaping is really hard. Automatic escaping can offset some of the risks of manual omission but this risk offsetting pales in comparison to what happens when manual oversight is removed from the equation. No matter how automatic escaping becomes, it needs to become far more complex to deal with how modern applications actually work – and the complexity needed makes a perfect solution improbable. In the near term, undermining XSS by removing its ability to rely on browsers to trust the HTML source markup being rendered is simply more effective. Good escaping, of any kind, matched with the Content Security Policy creates a defense in depth approach that will quickly become best practice in PHP. We’ll always need to practice secure escaping but the CSP will allow us to tolerate the inevitable mistakes far better if implemented.