LicenseSyndicate This BlogPlanet PHPMy interview at dot KDE - Henri Bergius
Thursday, September 2. 2010 Zend Framework is a BOSSie Award Winner - Zend Developer Zone Wednesday, September 1. 2010 Speaking at PHPNW 2010 - John Mertic Wednesday, September 1. 2010 Contributing to ZendFramework - ThinkPHP /dev/blog - PHP Wednesday, September 1. 2010 Big step forward in Modular Database Applications with DataObjects - Alan Knowles Tuesday, August 31. 2010 The fine art of application virtualization - John Lim (PHP Everywhere - By John Lim) Tuesday, August 31. 2010 Collecting Garbage: PHP's take on variables - Derick Rethans Tuesday, August 31. 2010 How to Roll Your Own JavaScript Compressor with PHP and the Closure Compiler - SitePoint » PHP Tuesday, August 31. 2010 Beware of the default Apache 2 config for PHP - Ilia Alshanetsky Monday, August 30. 2010 PHP Manager for IIS 7 – beta release - Ruslan Yakushev Monday, August 30. 2010 StatisticsLast entry: 2010-08-09 22:00
414 entries written
1581 comments have been made
|
Monday, August 9. 2010
Posted by Pádraic Brady
in PHP General, PHP Security, Zend Framework
Comments (19) Trackbacks (0) Defined tags for this entry: html sanisation, htmlpurifier, phishing, php general, php security, rantings, xss, zend framework Related entries by tags: HTML Sanitisation Benchmarking With Wibble (ZF Proposal) Zend Framework Community Review Team Mockery 0.6.1 Released Mockery 0.6 Released - PHP Mock Object Framework Mockery: From Mock Objects to Test Spies View as PDF: This entry | This month | Full blog HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities)
HTML Sanitisation (defined below) has been with us for a long time, ever since the first genius who came up with the idea of allowing potentially untrustworthy third party HTML to be dynamically patched into their own markup. The years have not taken this kindly, and third-party HTML inclusion has remained one of the most complex and underappreciated vectors for security vulnerabilities.
In this article, I take a look at some of the solutions PHP developers rely upon to perform HTML Sanitisation. Mostly because few others have done it or written about such solutions in any great detail (at least publicly). HTML Sanitisation has a very low profile in PHP. It's rarely mentioned, usually not understood all that well, and examining some of the solutions in this area with more deliberate attention is worth doing. Also, it's valuable research since I am writing my own HTML Sanitisation library (bias alert!) for a future Zend Framework 2.0 proposal. Knowing what the competition is up to does no harm! Finally, I was simply curious. Nobody seems too pushed to look closely at all these HTML Sanitisation solutions despite the fact that there are other developers (I think) who wouldn't touch most of them with a barge pole. One somewhat remarkable example, just to illustrate why I figured this article was worth the time, is HTMLPurifier's Comparison analysis where HTMLPurifier is compared against a number of other HTML Sanitisers. The comparison is remarkable because it seems inclined to err on the side of giving HTMLPurifier's competitors the benefit of the doubt. Unfortunately, this means the analysis is often flawed and its conclusions suspect. Also, it assists in legitimising other solutions in the minds of readers by making assumptions of safety. Not that this reflects on HTMLPurifier's functionality, incidentally, which I have always maintained is the only HTML Sanitiser worth looking at. Back on track… What is HTML Sanitisation?HTML is an amazingly dangerous thing. It can contain Javascript, CSS, or malformed markup, or even gigantic images that laugh at your dual 32" monitor setup. Each of these, in their own way, can damage the experience of an end user, whether it be by Cross-Site Scripting (XSS), Phishing or simply mangling the page until it's unusable and/or defaced with scriptkiddie jibes. There are two ways of dealing with these threats to the HTML output of an application: escape output so that the only HTML rendered by the browser is the application's (anything else being neutered by HTML entities), or by sanitising output so that any additional HTML it contains, that is renderable by a browser, is stripped of any potentially damaging markup. This article concerns the second option. HTML Sanitisation may therefore be defined as any means of filtering HTML to ensure that a) Cross-Site Scripting (XSS) vulnerabilities are removed, b) Phishing vulnerabilities are removed, c) the HTML is well formed and adheres to an acceptable HTML standard, and d) the HTML contains no obvious means of breaking expected web page rendering. I won't claim this is a perfect definition but it covers most of the salient points you'll likely encounter. So there are, broadly speaking, four primary objectives of HTML Sanitisation, any one of which is capable of preventing damage to end users or web application functionality (including javascript powered client side functionality). Each is, in its own way, quite a difficult proposition requiring suitable tools and specialised knowledge. However, with some objectives we can measure our success somewhat reliably. The question of this article being: how well do HTML Sanitisers in PHP measure up to these objectives? The CandidatesSince this is intended as a brief examination (just a few million words long!), I decided to select four candidate HTML Sanitisers meeting certain conditions. These conditions included: 1. Having a release at some point in the past two years; 2. Describing itself as a HTML sanitiser/filter to prevent Bad Things; 3. Having a design clearly in line with an intent to filter XSS/Phishing; and 4. Having no publicly acknowledged long standing security vulnerabilities. The great part about applying these conditions is that I pretty much eliminated stacks of HTML Sanitisers (as some might claim them as being). Outside of those, it also eliminates anything users might misconstrue as a HTML Sanitiser (for example PHP's strip_tags() function or Zend Framework's Zend_Filter_StripTags class). What we are left with is pretty thin on the ground, but fits what I'd expect a reasonably educated PHP developer to swing with. From what remained, I selected four candidates (or maybe these were the only four left - I'll never tell): 1. PEAR's HTML_Safe 2. htmLawed 3. Wordpress' Kses 4. HTMLPurifier With four candidates in tow, I proceeded to examine each against the four objectives of HTML Sanitisation I noted earlier. Just to emphasise to readers, this examination was not so in-depth as to identify every possible flaw or issue with each candidate. My intent was to attempt to locate one security vulnerability and assess each candidate in general terms for the other non-security related HTML Sanitisation objectives. Before I go any further, let me clarify that all security vulnerabilities discovered were notifed almost immediately to the parties responsible for each candidate solution. All such parties confirmed receipt of these reports within one week, and all were given a period (approx. six or more weeks to today) in which to apply fixes, make new releases, update documentation, perform additional security reviews, etc. I've sat on this article for a long time. Regardless of the effectiveness of any such actions (or lack of action as the case may be), this article discloses all perceived security vulnerabilities discovered during my examination whether or not all parties agree with my opinion. In some cases a disclosure may clearly indicate a fundamental flaw in the underlying design of the candidate. In these cases, it was emphasised to the responsible parties that reported vulnerabilities were limited to the scope of my examination and that I believed it was likely that additional and possibly related vulnerabilities remained unreported but easily discoverable as a result of public disclosure. This concludes my rendition of "Cover Your Ass" Note that all discussions below relate to the whichever version of each candidate solution was initially examined (before any fixed releases). I have noted resolutions as necessary for each. PEAR's HTML_SafeTo get us started, PEAR's HTML_Safe is one of the older candidates despite its examined release being in April 2010 (the previous release being a beta in 2005). HTML_Safe's description states that "This parser strips down all potentially dangerous content within HTML". HTML_Safe operates on the basis of parsing HTML with regular expressions and applying filtering logic which is dependent on predefined blacklists of potentially harmful elements, attributes and CSS properties. Unfortunately, HTML_Safe's blacklists prove its undoing. The problem with blacklists is that they require constant attention and updates for new problems. A cursory examination of the CSS property blacklist showed that it omitted many browser specific CSS properties such as -ms-behavior. The -ms-behavior CSS property (specific to Internet Explorer 8) may contain as a value a URI reference to a locally hosted HTC file (which contains executable Javascript). While such a file would need to exist on the local domain, this is obviously a security vulnerability in that it allows the execution of any arbitrary Javascript an attacker can store or reuse on the local domain thus opening up XSS possibilities. HTML_Safe also has another vulnerability shared by practically all HTML Sanitisation solutions based on the use of regular expressions. Regular expression parsing typically assumes that all HTML special characters are encoded in ASCII (an encoding subset which is common across other encodings such as ISO-8859-1 and UTF-8). However, UTF-7 encodes the greater than (>) and less than (<) characters differently. This means that typical regular expression parsing does not detect these characters when encoded in UTF-7. If you can't detect them, you can't sanitise them! This sanitisation bypass requires a secondary vulnerability where an attacker either forces a webpage containing unsanitised UTF-7 encoded markup to be rendered with a charset of UTF-7 (IE has vulnerabilities here, as do some versions of Firefox) or where the target application actually allows a user to select a nonsensical custom character set (as happened with Google and Yahoo when struck with this same exploit, and several CMS applications as recently as last Spring). Finally, HTML_Safe's blacklisting also misses out on CSS properties not directly tied to XSS (e.g. position), but which may be used to perform Phishing attacks by using CSS styling to alter or overlay HTML elements. This could, for example, allow an attacker to re-style their injected HTML to replace (as in positioning above) specific page elements (if not the entire page). This can lead to Clickjacking among other forms of Phishing. In terms of HTML well formedness, HTML_Safe does not necessarily emit standards compliant or well formed HTML, nor does it check some other common page breaking tactics such as overlarge images as a precaution. HTML_Safe may well be the least secure of our four candidates. Its use of blacklisting, its relative age, and a lack of peer review have left it woefully outdated for the task of HTML Sanitisation. In short, it should be avoided at all costs and my main request to PEAR at the time of reporting the above vulnerabilities was to seriously consider removing it from PEAR. Personally, I find it almost tragic that a library of such limited capability may benefit from PEAR's reputation and lead to users trusting it over far more secure alternatives. Of the issues noted above, the UTF-7 vulnerability has been resolved in a new release. The CSS blacklist has not yet been revised though I remain confident that the security advisory will go out any day now. PEAR does do security advisories, right? My recommendation, as originally suggested, remains that HTML_Safe should be removed, or overhauled, given it is not up to the task of HTML Sanitisation in its current condition. Blacklists are simply the worst approach ever to HTML sanitisation. htmLawedOur second candidate is, at least based on Google's omnipotence, one of the more popular open source standalone HTML Sanitisers. htmLawed has garnered a reputation as something of a rebellious spirit built for speed, a stark contrast to the slow resource intensive operation of HTMLPurifier with which is often compared. htmLawed's description states that it is "PHP code to purify & filter HTML" and that it can "make HTML markup in text secure and standard-compliant". The problem with htmLawed is that its operation is not much more than a very short stone's throw away from HTML_Safe. On the face of it, htmLawed is significantly more complex than HTML_Safe, being loosely based on Kses (an older HTML Sanitisation script). Its documentation is also huge, finely detailed, and packed full of options. It's source code is also complex and heavily obscured (presumably to discourage anyone from trying to examine it too closely). But...complexity and options a HTML Sanitiser does not make. Similar to HTML_Safe, htmLawed carries no functionality to thwart character encoding based attacks. The UTF-7 exploit as described for PEAR's HTML_Safe works perfectly well. In addition, htmLawed has a number of oddities which are not vulnerabilities but may potentially become such in the future. These are related to attempts by htmLawed to fix insecure attribute values and CSS values. I have no idea why htmLawed tries to fix them (just remove them!), but the results come very close to enabling reconstructive logic based attacks (i.e. where you can trick a fixing mechanism to reconstruct a more obvious attempt at a vulnerability into a less obvious but equally dangerous one). I can't say these are anything more than oddities however. One example, is that you can get the parser to create a CSS string containing something like "exp ress ion" (expression properties allow the execution of Javascript). The spaces in the string prevent it from being used by all current browsers, but what if a new browser release decides to ignore a little whitespace? Several other oddities also exist but are either harmless or fixed in the most recent htmLawed release. Back firmly in current reality, htmLawed's CSS filtering is seriously flawed. For example, it does not filter out the CSS behavior property which leads to the same result as it would for HTML_Safe's allowance of -ms-behavior, the execution of local domain hosted HTC files containing Javascript. And, again, htmLawed is open to the exact same Phishing and Clickjacking vulnerabilities as HTML_Safe, allowing all CSS position/height/width and other properties capable of re-styling web pages. htmLawed is not quite as easy to fool as HTML_Safe, but both share a remarkably similar lack of attention to specific areas of HTML Sanitisation. The conclusion I've come to hold is that both of these libraries (and others in the wild) are reading from the same book. Which might sound ridiculous before you consider that these libraries all seem to revolve around the year 2005 or so - the same period when Kses was King of HTML Sanitisation. There's no originality when you write new libraries in various styles which all follow the exact same assumptions and knowledge level. htmlLawed performs tag balancing and other cleanup tasks. Nevertheless, it will not necessarily output wellformed or standards compliant HTML. You should bring ext/tidy to the party as a post-processor. My main gripe with htmLawed, besides the above, is that it does not appear these are going to be fixed any time soon. The script's author has taken the approach that htmLawed filters HTML, and only HTML. Its CSS filtering can't be trusted and is not secure, and this IS part of HTML Sanitisation (HTML contains CSS!). Nevertheless, htmLawed documentation has been updated since my reports to clarify what developers may need to do before using htmLawed (i.e. normalise input character encoding, ensure all CSS is stripped/disabled or sanitised by another process, etc.). In short, htmLawed is not even remotely in the race as a fully featured HTML Sanitiser. It is lacking too many features, or using features which are obviously incomplete, and it is pushing too much complex responsibilities on end users who we all know will never bother doing them. The issues noted above have not, to date, been fully resolved to my satisfaction though the updated documentation means I can't call them vulnerabilities anymore. While documentation updates and minor fixes have resulted in some improvement and clarification, it appears that the fundamental flaws in CSS sanitisation and character encoding will not be implemented until some future unscheduled major revision of htmLawed (encoding has been documented as an end user concern instead). In effect, the security vulnerabilities reported resulted not in fixes, but in pushing the fixes back to the end users to deal with and denying them as a responsibility of htmLawed. In my book, that simply removes htmLawed entirely as a usable HTML sanitiser. Pushing responsibility back to users may let you off the security vulnerability hook but it does call the entire description and goals of htmLawed into question. Users are advised to ensure that they follow the documentation to the letter (read it in detail!) in disabling all CSS styling and enabling safe mode for htmLawed (sanitisation is disabled by default for this one). Using ext/iconv to do some minimal character encoding normalisation is also highly recommended. As with PEAR, there has been no formal security advisory issued, also the release notes for the latest release make no reference to any security vulnerability existing. Wordpress (Kses)Candidate three is Wordpress 3.0, or more specifically the Kses script bundled with Wordpress that is used internally for HTML Sanitisation. It should be noted that while the original Kses is stuck in 2005 (and should be avoided like the plague), Wordpress have heavily updated their internal copy. This creates a viable HTML Sanitisation solution which is simple to extract for personal use (just needs a handful of extra functions borrowed from an includes file), and which may well be the defacto winner in the popularity stakes just because it's a core part of everyone's favourite blogging platform. What could possibly go wrong? Well, surprisingly not as much as I was half expecting after my earlier candidates! Wordpress' Kses proved far more challenging than the previous two candidates. It's obvious that the widespread use of Wordpress has enforced constant peer review and improvement. Nevertheless, where there's a will (and a quiet afternoon), there's a way. Wordpress' Kses first of all offers no protection when used as a standalone sanitiser against character encoding attacks, thus the previously described UTF-7 XSS attack works quite well. I do NOT consider this a security vulnerability since the use of Kses within Wordpress prevents manipulation of the HTML's declared charset (though presumably this protection could be mangled by some Wordpress template author). Nevertheless, for anyone using Kses in standalone mode you should note that character encoding normalisation is necessary and should be implemented. Fast forward three coffees, and it finally clicked that Kses did have one obvious flaw in that it uses a little blacklisting for CSS filtering. In short, it removes all CSS attribute values which contain any of the characters /, \, * and (. It's actually very clever (in that it's indiscriminately simple) since it just removes CSS using these characters altogether. Unfortunately, it never considered the use of the right-handed curly brace, }. Under all versions of Internet Explorer, in either quirks or standards mode, the right curly brace is treated as a CSS terminator (i.e. just like a semi-colon). This means that something like "position: absolute; top: 5px; left: 10px;" can be written for Internet Explorer as "position: absolute} top: 5px} left: 10px}". IE extends another unwanted helping hand... The result is that such CSS styling values are not intercepted by Kses (which relies on semi-colon terminators), and may allow for Phishing and Clickjacking attacks where Kses filtered user input is rendered to an end user from Wordpress. After those three coffees and some head banging, I thanked my numb brain for finally giving me something and left well enough alone. Later on, I also reported to the Wordpress team that Internet Explorer in quirks mode (doesn't work in standards mode) also accepted the equals sign in place of a colon, i.e. "position: absolute;" could be written as "position=absolute;" or even "position=absolute}". This is of far less concern than the previous issue - and more of a funny aside on IE's incredible silliness. As with previous candidates, Kses will not always output well formed HTML not will it check for other page breaking tactics. Again, this will require some custom checks and perhaps ext/tidy in some cases. Wordpress' Kses is surprisingly (well, it should be!) adept at the HTML Sanitisation game. If it bundled encoding normalisation, and packed HTML tidying and a few other bits and pieces I'd almost use it myself. Almost. At the end of the day, however, I just don't trust regular expressions enough. The issues noted above were fixed in Wordpress 3.0.1 released in late July 2010 (and weeks beforehand in SVN). The reported Phishing issue was mysteriously absent from the list of 54 fixed issues for 3.0.1, which isn't really all that surprising (would not have been a public issue anywhere) but it would have been nice to see a public disclosure from the vendor before I published this article. HTMLPurifierHTMLPurifier is the brain child (no doubt with Angelina Jolie) of Edward Z. Yang. In short it is nothing remotely like the other HTML Sanitisation candidates. It bundles a HTML parser/validator/supercomputer (rumour says it might be The Stig's brain). HTMLPurifier describes itself with "HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications." Short answer. I completely failed to find anything that got past it. I tried. Then I tried some more. Then I tried multiple caffeine injections but to no avail. It just sat there looking smug. HTMLPurifier is seemingly impervious to pretty much anything. Worse, it refused to produce any mangled attributes, or reconstructions, or anything else I normally expect of a typical HTML Sanitisation solution. Out of four candidates, HTMLPurifier was the only one to successfully meet all four objectives of HTML Sanitisation. It's only problem? It eats RAM and sucks CPU cycles far more than any other candidate. A point which at least some HTML Sanitisers may note as a disadvantage since they can't hang it on anything else. Since I cannot just leave it there, and since it is always worth noting, this does not mean that HTMLPurifier is invincible. The problem with the world of Cross-Site Scripting is that there are constantly evolving new ways to attack users. HTMLPurifier has fixed several vulnerabilities in the past, and these are publicly disclosed by the author (as security advisories even). Many of them are reported directly by discoverers of new vectors which adds to HTMLPurifier security in one respect since it gets such attention. The author is also quite proactive about locating and researching fixes to possible future vulnerabilities. HTMLPurifier is, quite simply, the only fully functioning and fully featured HTML Sanitiser in PHP. It literally stands alone. What Does It All Mean?Besides the fact that PHP developers are sheep to the slaughter? It means that HTML Sanitisation is incredibly misunderstood even by the authors of HTML Sanitisation solutions. It means that such solutions have minimal peer review by individuals who are relatively knowledgeable of security issues. It probably means that the days of combating XSS and Phishing and other concerns linked to HTML Sanitisation are far from over, at least for PHP. Consider the nature of the vulnerabilities. None of them are particularly obscure or poorly documented in the public domain. There are several excellent sources of XSS/Phishing vectors for browsers, but it truly appears that everyone relies entirely on just one - the XSS (Cross Site Scripting) Cheatsheet and it's blatantly obvious that it's the only source for XSS vectors utilised by most solutions in whatever passes for their testing (I doubt anyone checks the ha.ckers.org forum for more either). This is a limited exposure issue - if you focus on the same subset of possible exploits and ignore anything else, you are entirely at the mercy of a single source of knowledge that omits quite a lot (the XSS Cheatsheet doesn't contain all possible exploits - far from it!). For example, the CSS vulnerability approach for Wordpress Kses is clearly documented on the HTML5 Security Cheatsheet but not mentioned anywhere on the more commonly referenced XSS Cheatsheet. It's a very old CSS vector. In addition, many HTML Sanitisation solutions share common approaches and vulnerabilities. I was only looking for one vulnerability, but cross checks netted those in common. This is one indication of an isolated evolutionary line - everybody feeds off everybody else's work in the area with no attempt to look outside the PHP house to see what the weather is like over in Rubyville or Javatown (it looks really sunny in both incidentally) or even just to keep up to date on new resources, ideas and research. More interestingly, all this in-feeding seems to start with Kses in 2005. It's like the past six years dropped into a timeless black hole insulated from the harsh realities of Terra Firma even as tidal forces slowly tore them apart. The singularity being a library from 2005 that is no longer maintained (outside of Wordpress). What is surprising, is that this isn't surprising to me anymore. It's par for the course in PHP and that hasn't changed since the 90s. I love PHP to bits, but it's a language that is tragically short on security expertise, and where a security expert can be nothing more than someone who read a book long ago and who has no incentive to move beyond the obvious or do something extraordinary - like using Google. If this article achieves nothing more than a blip on some faraway radar forgotten the next day as the download count for HTML_Safe, htmLawed, and all the other solutions I haven't examined, goes unchanged then at least I got a blip. It's a start. Maybe it's a few less XSS reports on Bugtraq or a few less people choosing an insecure solution because it claims to be fast and routinely misleads people about its efficacy in between taking senseless potshots at the only reliable solution out there. Maybe a few more developers will question all these miraculous HTML sanitisers and stop believing their summary descriptions at face value. Developers need to start using their heads when it comes to selecting any security related solution. It's truly amazing to see developers recommend something purely on the basis of speed, or to watch them argue against all logic in support of an insecure option and just plain ignore the security implications (or assume there are none, or point at an RFC and misquote/misread it). The whole point of a security related solution IS security. If you want to compromise security to gain performance then at least be brutally honest about what it entails for the applications you build. Do yourself a favour, use HTMLPurifier. And Ambush Commander, update the fracking comparison page already! Comments
Display comments as
(Linear | Threaded)
Let's not forget that PHP's DOM is quite good at stopping a lot of malicious code too, as it will automatically escape any content you set for a DOMTextNode. It's not as dedicated as HTMLPurifier but it's something you get for free.
The best PHP security related post I have read in a while. Well researched and put together. I will be passing this on.
You may also want to try the Secure HTML parser and filter Open Source package.
This is an extensive package for parsing HTML, XML, DTD and CSS. It consists of a chain of parser classes that you can connect to perform several actions. For instance, you can parse HTML and filter the document with an included filter class that uses DTD to exclude invalid tags and attributes or malformed tag structures. Another filter class can exclude insecure HTML that may include dangerous Javascript either in the HTML tags or in the CSS definitions. There are other filter classes for other purposes, like extracting links, or adding nofollow to external links. You can develop your own custom filter classes with small effort, taking advantage of existing filters. http://www.phpclasses.org/secure-html-filter Here you may find a an online demo to try it as a security filter. It was extensively tested, but just let me know if you find any insecure cases that you can bypass its security filters. http://www.meta-language.net/markupparser/secure_html_filter.html
CSS code in HTML can be used not just for security attacks but also for other purposes like breaking web-page layout or style. Sanitization/filtering of CSS code to prevent only the first is not enough. Which HTML sanitizer library accomplishes the larger objective?
The htmLawed documentation recommends admins to deny the 'style' attribute altogether, and permit only 'class' with names indicating pre-set style values. htmLawed has functionalities that allow permission of only certain class names (for specific elements if desired). It also can let an admin run his own custom filtering code on in-line style attribute values.
Hi Santosh,
As the article notes, CSS may be used to style elements in such a way that may overlay or expand themselves to alter page layout. This could be simply to deface the page, or to architect a Phishing attack. As also noted, HTMLPurifier filters out the CSS properties that allow this to occur. Such CSS alterations are indeed a security vulnerability. htmLawed also includes CSS filtering which, as reported, is not sufficient. The fact that it is present indicates an attempt to eliminate security issues arising from CSS. For example, htmLawed indicates that it prevents attacks arising from the XSS Cheatsheet (which includes CSS attacks). Since the attempt has not worked and/or is flawed (an assumption not obvious prior to documentation changes arising from my reports), I've assumed this is a reversal of intent - i.e. rather than fix the issue, the docs now push that responsibility back to users. It's a transfer of responsibility previous users may not yet be aware of since it is not contained in the release notes, for example.
So if HTML Purifier is that good, will you still be proposing your own for inclusion into Zend?
Yes, it's being proposed to Zend Framework. HTMLPurifier really is that good, largely because it properly normalises and parses HTML. There are ways to achieve that from another direction (DOM/Tidy) which is my intent. Note, I'm not suggesting mine is as secure as HTMLPurifier until it's finished and thoroughly tested
I believe you're incorrect regarding the -ms-behavior css error in HTML_Safe. The blacklist includes "behavior" which is matched if the style attribute value contains any portion of the blacklisted words. Another blacklisted keyword is 'absolute' which would be triggered if the positioning was attempted to be altered from relative to absolute.
I agree, HTML_Safe still has problems, but accurately reporting these problems is important. Thanks for the great read.
Quoting from the original report (26 June '10):
"Bonus vulnerability from a brief look through of the blacklists: The IE8 compatible -ms-behavior CSS property (identical to behavior) is omitted from the blacklist." It was obviously not tested which is why I referred in the article to it being from a cursory examination of the blacklist. If you really really want, I could swap it for something else which allows one to still use a CSS behavior. The position property was an example, and absolute positioning isn't required for Phishing (though it would make a Phishing attack a wee bit simpler). I haven't actually noted the "absolute" value when referring to HTML_Safe.
Thanks for the excellent article. Very informative.
looking forward to your solution. To be honest, we're using HTMLPurifier and I have yet to encounter big problems with it's speed or lack thereof, but It'd be nice to have a sanitation solution built into the framework we use.
@Padraic: What will you do in a 6months when html5 becomes popular and along with it standardized parser. Its probably gonna take forever before php's html dom parser and tidy are gonna catch up.
I haven't decided on it yet. At the moment, many server side development tools are in the same boat. libxml2 and tidy are popularly used on linux platforms. There is a PHP implementation of the html5lib parser for Python (which is pretty much the only open source option at the moment I know of). If it becomes necessary, I can add an optional adapter to reuse that as a parser. It won't be fast but it should be enough to cover the gap until libxml2/tidy are eventually updated. That said, the current libxml2/tidy combo can operate on subsets of HTML 5 for the simple reason it's backwards compatible with HTML 4 or so. It just won't accept the new elements (and these normally would not crop up in HTML snippets from users/blogs - not yet anyway!).
html5lib (http://code.google.com/p/html5lib/) is the one I run on a few days ago, so I'm probably guessing that this is the one you are talking about.
It hasn't been updated for a while... but it's better than nothing.
OMG. What did I write. You mentioned html5lib in your post. And I go on mentioning just that.
/me is now ashamed and hiding under the table.
Is it a big table?
Does anyone have any input on "Universal Feed Parser" and its effectiveness?
I just wanted to thank you for the article and the research. I was looking for a solution and was surprised to find very little information on a topic that that seems so important. So thank you once again. Its amazing that this html sanitation is not covered more often, but maybe that just because I live in PHPville.
This is quite an interesting post and also informational. Certainly one of such posts that brings a fresh perspective to wonderful topic.
|
Calendar
QuicksearchCommentsRichard about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Mon, 30.08.2010 23:22 This is quite an interesting p ost and also informational. Ce rtainly one of such posts that brings a fresh perspect [...] Bobby about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 17.08.2010 22:24 I just wanted to thank you for the article and the research. I was looking for a solution and was surprised to fin [...] Tyson Sturdivant about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Mon, 16.08.2010 19:30 Does anyone have any input on "Universal Feed Parser" and it s effectiveness? Pádraic Brady about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Mon, 16.08.2010 17:44 Is it a big table? Miha about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Thu, 12.08.2010 15:59 OMG. What did I write. You men tioned html5lib in your post. And I go on mentioning just th at. /me is now ashamed [...] Miha about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Wed, 11.08.2010 20:46 html5lib (http://code.google.c om/p/html5lib/) is the one I r un on a few days ago, so I'm p robably guessing that th [...] Padraic Brady about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Wed, 11.08.2010 19:56 I haven't decided on it yet. A t the moment, many server side development tools are in the same boat. libxml2 and t [...] Miha about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Wed, 11.08.2010 19:32 @Padraic: What will you do in a 6months when html5 becomes p opular and along with it stand ardized parser. Its prob [...] Maarten about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Wed, 11.08.2010 12:45 looking forward to your soluti on. To be honest, we're using HTMLPurifier and I have yet to encounter big problems [...] Padraic Brady about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 18:44 Quoting from the original repo rt (26 June '10): "Bonus vu lnerability from a brief look through of the blacklist [...] Jeremy Cook about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 18:30 Thanks for the excellent artic le. Very informative. Brett Bieber about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 17:43 I believe you're incorrect reg arding the -ms-behavior css er ror in HTML_Safe. The blacklis t includes "behavior" wh [...] Pádraic Brady about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 11:18 Yes, it's being proposed to Ze nd Framework. HTMLPurifier rea lly is that good, largely beca use it properly normalis [...] Pádraic Brady about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 11:13 Hi Santosh, As the article notes, CSS may be used to styl e elements in such a way that may overlay or expand th [...] Peter about HTML Sanitisation: The Devil's In The Details (And The Vulnerabilities) Tue, 10.08.2010 10:15 So if HTML Purifier is that go od, will you still be proposin g your own for inclusion into Zend? CategoriesArchivesTop ReferrersShow tagged entries application security article astrum futura asynchronous processing atom bdd behavior-driven development behaviour-driven development benchmark book deep end dependency injection design patterns devnetwork docbook documentation eve online games htmlpurifier inversion of control irish php user group irishisms maugrim microformat mock objects mockery model mutateme mutation testing mvc oauth openid openid and yadis pc gaming pear phing php php game development php games php general php security phpmock phpspec phpunit poka-yoke qgl quantum game library quantum star se rantings rss simpletest snarl solar empire surviving the deep end symfony tdd test spy tutorial unit testing xp programming xrd xrds xss yadis yaml zend framework zf proposal zfstde |
|||||||||||||||||||||||||||||||||||||||||||||||||


