PHP, Zend Framework and Other Crazy Stuff
Archive for June, 2012
Automatic Output Escaping In PHP And The Real Future Of Preventing Cross-Site Scripting (XSS)
Jun 17th

Even Dexter Knows HTML (Photo credit: mollyeh11)
A while back, the Zend Framework 2.0 team decided that automatic escaping for Zend\View (a template engine where all templates are written in PHP itself) was too unsuitable and potentially confusing to be included. As a result, Zend\View templates will continue relying on manual escaping against Cross-Site Scripting (XSS) vulnerabilities using a new Zend\Escaper component.
Nevertheless, the decision was not taken lightly. Automatic escaping has a certain appeal given its goal of removing the need to type escape() all over your templates. Funny thing, though, is that this is basically its one and only advantage. The second claimed goal is to remove a factor of human error (i.e. forgetting to type escape() somewhere), however, this hasn’t posed an issue for me in the past where simple analysis of templates can quickly locate such omissions. And no, using automatic escaping does not remove the need to analyse templates for security issues - that’s still needed regardless. At some point, we seem to have lost the plot and overinflated these benefits in our minds. So, rather than muddy the waters with confusing object-proxies and expending too many CPU cycles for too little of a benefit, ZF 2.0 is back on manual escaping.
In reality, automatic escaping doesn’t resolve all (or even most) of your security problems with XSS. Why? Because all escaping, regardless of how automatic it claims to be, still needs one fundamental factor to be successful: manual oversight by a knowledgeable programmer.
Whether you choose manual or automatic escaping, neither will prevent poorly educated programmers from shooting themselves in the foot with XSS vulnerabilities because those kinds of programmers just don’t understand XSS. Worse, it still won’t prevent even really good programmers from making errors of omission or misjudged context - nobody is perfect and any process needing human input inevitably experiences human errors.
In the game of mitigating against the risks of XSS, how you escape is not as important as knowing why you are escaping. That second point, understanding why you escape data on output, is unfortunately commonly misunderstood. Yet, without that basic understanding - your choice of how to escape is quite possibly incorrect and, worse, it allows insecure escaping practices to thrive as that misunderstanding becomes embedded in what we pass on to other PHP programmers. We’re self-perpetuating our own ignorance - Stackoverflow and articles still commonly present escaping notions that are plain wrong.
So, let’s travel down the automatic escaping rabbit-hole to understand why automatic escaping hasn’t progressed much further than serving as a method for reducing template verbiage. At the end, I’ll explain what is being done on the browser-side of XSS prevention to offset the problems ALL programming languages are having with getting escaping done perfectly.
What Is Automatic Escaping?
Defining automatic escaping would probably help to explain why manual oversight of escaping is unavoidable. There are broadly two definitions in use these days, or perhaps it’s more accurate to call them styles. The first automates escaping by applying a fixed escaping strategy to all data being output in a template. This is the “scope limited” style where the auto escaper is incapable of automatically switching escaping strategies depending on the context into which the data is being output. Another phrase used to describe this form of automatic escaping is Poka-Yoke (a Japanese phrase for “mistake-proofing”). A good example of this style is the Twig library used in Symfony 2:
If you’re using Twig templates, then output escaping is on by default. This means that you’re protected out-of-the-box from the unintentional consequences of user-submitted code. By default, the output escaping assumes that content is being escaped for HTML output.
In a scope limited automatic escaper, the programmer must be able to spot when the fixed escaping strategy of the escaper is inappropriate to the current context. So, sticking with Twig templates, when injecting data into a Javascript string you would need to bypass Twig’s automatic HTML escaping in order to manually apply Javascript escaping to that string. Without that manual bypass, Twig’s escaping would have enabled a potential XSS vulnerability by wrongfully applying HTML escaping to Javascript.
Context Always Determines Your Escaping Strategy
Briefly, the context of a data insertion determines how a client browser will interpret that data. For example, data output into a HTML attribute value is, surprise, in the HTML Attribute context. This means a browser’s HTML renderer will treat it as a HTML attribute. I know, this revelation is so shocking that my brain is in danger of exploding. However, let’s imagine that the output is inserted into an onmouseover attribute. This obviously means that it’s in the HTML Attribute context. However, it also means that it’s in a Javascript context - the attribute value is executable by the browser’s Javascript engine when a mouse-over action is detected for that element. Contexts can be nested - so can escaping needs.
Each such context demands a specific escaping strategy. Escaping for the HTML Attribute context is not the same as escaping for the Javascript context. Both have completely different escaping rules (i.e. different special characters and replacement strings). If you apply HTML escaping (e.g. htmlspecialchars()) to a Javascript string - you completely fail to escape properly against XSS. Worse, if the output has entered two contexts (i.e. our onmouseover attribute value), you must escape it twice - once for HTML, and once for Javascript. Oh, and you need to escape them in the correct order: Javascript first and HTML second. Why? Because attribute values are HTML unescaped before the browser will interpret the Javascript it might contain.
The main contexts to be aware of are: HTML Body (element text nodes), HTML Attribute, Javascript, CSS, Untrusted URI, GET/POST parameters (also URI related) and DOM. All have varying escaping/validation strategies that may depend on the actual content also. For example, inserting strings in HTML Body contexts is quite different from inserting HTML markup into that context - the latter needs a HTML sanitiser rather than an escaper! As such, an escaping strategy may require a validation task instead of, or complimentary to, an escaping function. Determining context also relies on understanding how your output is manipulated between a HTTP request being received and having a client browser render a viewable form(s) in response to user interaction or pre-programmed events. Just because your templates look nicely escaped, it doesn’t mean that by the time Javascript has finished scrambling them that the rendered version is escaped properly.
When I said my brain would explode, did I mention my symptoms are contagious?
Scope-Limited Automatic Escaping
Back to our two automatic escaping styles… In a “scope limited” automatic escaper, such as Twig’s, having a fixed escaping strategy means only one thing. You’ll go to Hell if you forget to successfully track contexts and manually intervene to prevent the automatic one-trick escaper from introducing a security vulnerability. Being automatic does not make a scope limited escaper infallable - it just makes it dumber than a programmer. Someone still has to read the template, ensure the automatic escaping is appropriate, and manually insert escape calls or disable the auto escaper altogether where it isn’t appropriate.
Based on the above, scope limited automatic escaping targets just one of the two manual tasks associated with secure escaping against XSS: typing escape calls into templates. In essence, its only purpose is to eliminate template verbiage to the degree that your templates require the single escaping strategy it exposes by default - usually HTML escaping via htmlspecialchars(). It doesn’t concern itself with the second task of determining context to ensure the escaping used is safe.
That’s your job.
Context-Aware Automatic Escaping
The second style of automatic escaping is “context aware” automatic escaping. In context aware escaping, the escaping mechanism can analyse your templates to detect the contexts that apply to each output. Based on the contexts detected, this mechanism can then select an appropriate escaping strategy to apply. In theory, a reliable implementation of context aware escaping would eliminate all manual programmer involvement in escaping against XSS which would, very obviously, render all other forms of manual or automatic escaping obsolete. An example of such a solution with these claimed benefits is the Latte template engine used by Nette Framework:
If the coder omits the escaping a security hole is made. That’s why template engines implement automated escaping. The problem is that the web page has different contexts and each has different rules for escaping printed data. A security hole then shows up if the wrong escaping functions are used. But Latte is sophisticated. It features [the] unique technology of Context-Aware Escaping which recognizes the context in which the macro is placed and chooses the right escaping mode. What does that mean? Latte doesn’t need any manual work. All is done automatically, consistently and correctly. You don’t have to worry about security holes.
Context aware escaping is clearly the next step in automatic escaping but it remains a juvenile development to be taken with a pinch of salt. Its primary problem is that the reliability of solutions under this flag is frequently in question due to the complexity of tracking output context in modern applications which can combine multiple web technologies and programming languages. Most context-aware escapers limit themselves to a specific template language (almost certainly XHTML-compatible with minimal inline Javascript/CSS support) and ignore all other possible influences on the rendered output.
Several such solutions (PHP or not) incorporate potentially fatal design decisions which can include poor/insecure escaping strategies, opt-in disabling of escaping, lack of manual overrides to allow programmers select preferred escaping strategies, and error-prone context determination (e.g. due to poor quality HTML and Javascript parsers or a lack of analysis of outside influences). All the variants I examined have one or more of these problems. While these are potentially solveable, the main problem will always be that a programmer is significantly better equipped to determine context since they are not blindfolded against what happens outside of the templates a context-aware escaper is limited to parsing.
Once again, automatic escapers are dumb. They don’t replace your brain.
Where Is Automatic Escaping Going?
Given the state of automatic escaping in PHP, I’m not too keen about their direction. As a tool for convenience, they have benefits to reduce typing and assist forgetful programmers. As a tool to be blindly relied upon - are you nuts? And that’s the real problem that automatic escaping has: the potential for blind reliance.
The primary symptom of this sits squarely with the documentation for automatic escapers. Many are couched in language which may downplay their disadvantages or neglect to mention them (or any other conceivable problem) at all. The average reader could be forgiven for arriving at a general conclusion that automatic escaping replaces the need for manual oversight - a bad conclusion that leads to a false sense of security and a blind spot to potential XSS vulnerabilities. This is not to say that libraries/frameworks are doing this deliberately - often it’s simply a case of assuming the reader knows what secure escaping is and knows enough not to put too much faith into automation. There is a minority of cases, however, where the documentation is completely silent as to the downside thus rating all their flaws as the next worst thing to a reportable security vulnerability - stuff that will never be fixed and which users are blissfully unaware of and so will use, unknowingly introducing vulnerabilities into their applications because they trusted the wrong solution.
This is PHP - presuming everyone knows about good secure escaping, or will read your source code to find security weaknesses, is the wrong assumption to make when your programming language has a long established history of sucking at security, in particular sucking at preventing XSS through escaping.
All frameworks are in the same boat regardless of their escaping practices - if manual oversight of escaping is sacrificed, insecure applications will inevitably be the result. Heck, even with amazingly good oversight it will still happen - just more rarely. The Homo Sapien programmer is always the weakest link. We could wait for evolution to make us all insanely obsessive about examining every potential piece of data exhaustively for security issues but let’s face it - it ain’t going to happen. Automatic escaping is not our saviour. It can help, it could help more in time, but it will never become the final solution to the manually intensive and mind-boggling task called escaping.
Luckily, we have a helping hand on the way…
Content Security Policy (CSP) To The Rescue!
There are two obvious problems with preventing XSS that all programmers (from all programmer languages) have battled with since HTML was invented:
- Escaping is too complicated; and
- HTML standards continue supporting a status-quo where XSS’s greatest ally is HTML itself.
The first is obvious. Escaping is a pain in the ass. You need to educate yourself about it which most programmers probably don’t. Even when you are educated, it’s still an error-prone task. Existing escaping strategies used in the wild are too often insecure, insufficient and downright funny at times (using json_encode() for Javascript escaping is a good one). Programming languages barely recognise the problem, e.g. PHP has no native Javascript or CSS escaper, its URI escaper until recently was out-of-sync with the applicable RFC (relax, a minor transgression that just irked OAuth devs) but remains character encoding unaware, and its HTML escaper needs a dedicated wrapper function to completely lock it down and assure security (see Twig for a well done example of such a wrapper) because it’s not specifically for escaping at all. PHP desperately needs a native escaper class or set of functions to eliminate all the manual torture and programmer uncertainty.
Let’s face it - the first problem is never going to vanish. We’re stuck with it forever.
The second refers to how XSS works. Often, XSS relies on injecting inline Javascript or source file references into HTML documents. The HTML spec allows this, and HTML5 even allows unquoted attribute values which is well known to make XSS easier since htmlspecialchars(), for example, is worthless in an unquoted scenario (i.e. there are no quotes to break out of anyway so escaping for them is useless in preventing XSS). HTML doesn’t concern itself with preventing XSS because it values backwards compatibility and feature completeness. So, the mountain is definitely NOT coming to Mohammed.
HTML is also never going to vanish. We’re stuck with it forever too.
Unless…we cheat and forcefully alter how the HTML specifications apply to our applications.
This “cheat” is known as the Content Security Policy (CSP). The CSP is a policy which communicates to clients and browsers, via a X-Content-Security-Policy header (X-WebKit-CSP for Chrome/Safari), how we want them to behave when parsing HTML. Specifically, it puts limits on which scripts and styles are to be trusted. For example, the CSP mandates that, by default, all inline Javascript and CSS in a HTML document is not to be trusted. Browsers which support the CSP (Firefox 4+, Chrome, IE10 (where it’s a WIP), Safari etc.) will therefore refuse to execute any inline scripts or styles - by default. Where an attacker manages to find a gap in your escaping, and injects an inline script, style or pretty much any form of inlined naughty stuff - it will be ignored by the browser and rendered completely harmless. The same goes for external resources - the CSP can whitelist trusted domains and browsers will ignore all other external scripts/style resource URIs. You can whitelist certain inline resources and other useful bits if you are careful - the point here is to eliminate XSS, not make HTML impossible to use.
By alienating the practice of automatically trusting inline resources in HTML and external resources as a default, we’re basically flipping a finger at HTML in the best possible sense by neutering the insecure practices it allows. The new approach asks that you whitelist the inline and external resources that should be trusted. It removes the automatic trust-everything problem with HTML that allows XSS to thrive.
All PHP programmers should consider adopting the Content Security Policy. While the specification is being drafted by the W3C, and while it will take time to gain majority coverage as newer browser versions are adopted, it can be implemented right now with an eye towards the future. This will very obviously become a best-practice security defense for web applications, so get used to it being preached to you.
Conclusion
Escaping is really hard. Automatic escaping can offset some of the risks of manual omission but this risk offsetting pales in comparison to what happens when manual oversight is removed from the equation. No matter how automatic escaping becomes, it needs to become far more complex to deal with how modern applications actually work - and the complexity needed makes a perfect solution improbable. In the near term, undermining XSS by removing its ability to rely on browsers to trust the HTML source markup being rendered is simply more effective. Good escaping, of any kind, matched with the Content Security Policy creates a defense in depth approach that will quickly become best practice in PHP. We’ll always need to practice secure escaping but the CSP will allow us to tolerate the inevitable mistakes far better if implemented.
The Framework Interoperability Group (FIG): Openness, Accountability and Community Involvement in PHP Standards
Jun 1st

A fig tree (Photo credit: Giorgos~ (moving to Google+))
Recently, Anthony Ferrara posted “Open Standards - The Better Way“, a blog post questioning the operation of the PHP Framework Interoperability Group (FIG). The FIG is a body representative of PHP “frameworks” (with a broad definition since it includes phpBB, Drupal and even more divergent members) which issues PSR standards such as the PSR-0 standard which governs requirements for autoloader interoperability and the upcoming PSR-1 and PSR-2 standards which, together, form a coding standard for PHP.
Anthony, whose views always make good reading, raises concerns about the way in which this group generates standards. He contrasts the current approach to RFC 2026 which defines the IETF’s Internet Standards Process. That approach evolved to balance conflicting requirements. On one hand the rapid evolving nature of the internet demands the timely production of standards while, on the other hand, the standards must be subject to an open and fair process, proper testing and technical development. Surrendering to either side of this conflict tends to give rise to poor standards.
Where Anthony’s arguments seemingly fall flat is that the FIG is not the IETF. The Framework Interoperability Group was founded to allow cooperating members to develop shared standards. It does not claim to be PHP’s standards body and so there is no obligation for any PHP programmer to adopt their standards (unless they work on a member project obviously!). On this simplified basis, the FIG doesn’t need to balance the conflicts described by the IETF. It serves its own interests, i.e. those of its members who may voluntarily adopt accepted standards at their convenience to boost interoperability between their software. If they be happy, they be happy.
However, the FIG is not a closed standards body. Anthony’s article simply targeted the FIG’s processes which have been slowly evolving from a semi-closed process (there was a time when the mailing list was not open) towards a more open process. By inches, the FIG is becoming a defacto PHP Standards Working Group and I’m skeptical of anyone who doesn’t see that. The only technical reason they can’t really use their original title as-is is probably because of licensing concerns in using the “PHP” moniker. The phrase “Framework Interoperability Group” is much…er… better. It’s catchy even if nobody will have a slightest clue what the heck it means.
It gets even more interesting though. While it’s easy to assume that the FIG is operated by some undetermined number of individuals one can ignore as a bunch of power hungry people, the truth is that the voting members are not people at all - they are nearly all open source projects. FIG is operated by individuals on the basis that they represent the interests of the FIG’s real members, i.e. projects which include Zend Framework, Symfony 2, Lithium, Aura, phpBB, Joomla, Drupal and even an individual who, as an honorific, represents “the community at large”.
This is why PHP programmers need to pay attention to what the FIG is doing. The Group is, by definition, an open standards working group. It does actually wield real authority and influence by virtue of the fact that its members represent a not inconsiderable number of open source programmers both within their projects and among those who build any sort of dependent library or plugin against those projects. Once PSR standards are issued, they also benefit from the reputation of the members, the knowledge that it was subject to an open process, and an assurance of independence since it was agreed to by a majority of the member projects and not just one of them.
In other words, the FIG is actually something really really good for PHP. PHP needs standards so we can make interoperability between various frameworks and applications a true reality. The hodgepodge of APIs and standards we’ve relied on to this point only serve to reinforce PHP’s NIH obsession (the bad type, not the good type that keeps PHP reinventing better variants of everything under the Sun). These days you can’t simply pick a library to replace another - they’ll have different APIs. Besides NIH, it also encourages stagnation since different APIs generate a barrier to new replacements. Thankfully, that whole APIs are copyrightable lark is now officially dead in the EU (and close to it in the US!).
But let’s not forget Anthony’s article. Can the FIG improve its processes for creating standards? I’d venture that it could and articles like Anthony’s are part of that improvement. Far from being a case of “pitchforking”, questioning the open process of FIG should be as acceptable as having some weird people called Mr. Grumpy ranting and raving on an open source mailing list. Open source projects embrace the programmer community as their most valuable resource. Open standards bodies are no different - they derive their authority and influence from the communities they serve.
What the FIG should do, in my opinion, is clearly define its purpose and better document its bylaws/processes. At the moment, the messaging is confusing. Some member statements leave no doubt that the purpose of the FIG is to serve its member projects in producing shared standards but other statements demonstrate a hope that PHP at large will also adopt the same standards to improve PHP’s overall open source ecosystem. Do observers need to flip coins to figure out what the FIG’s mission is? Personally, I see absolutely nothing wrong with the FIG encouraging the PHP community to adopt its standards. It’s not a power grab or a selfish act to do so - it’s just a group of open source minded folk trying to promote greater cooperation among programmers. Documenting bylaws and processes would clear up any other elements of how the community can become involved and remove any perception of a closed network of individuals. What is the criteria for membership? Who can submit new proposals? How are member representatives selected, replaced, etc? How are proposals handled, reviewed and progressed? Why are proposed standards not obviously available in the Github repo under the proposed directory?
It really all comes down to better communication and pushing the community engage with the FIG. To this end, bear in mind that the FIG members are open source projects. The FIG membership is also growing (presumably it will stop growing eventually unless the group wants an unwieldy number of chefs in the kitchen). Each project has a voting representative who is assumed to represent the interests of that project. If you are a contributing member to or a user of any representated open source software, your first step should be to hold your representative accountable. Open source projects tend towards Meritocracy’s with an optional Benevolent Dictator - you may already have a voice in how the FIG operates and how your representative should be voting. You just have to use it!
On the flipside of this communication route, ensure your FIG representative keeps the project’s community involved and informed. They should be seeking your feedback, building support for their votes, and using whatever internal proposal mechanism exists (formal or informal) typical for such decisions prior to casting their vote. The FIG should never involve bypassing an open source project’s normal operations. Hold your representative accountable and ensure they represent that project’s interests. If they don’t - pitchforks can be bought at most reputable hardware stores.
Related articles
- Open Standards - The Better Way - Anthony Ferrara (ircmaxell.com)