Archive for September, 2010

Zend Framework Proposal: Zend\Html\Filter (HTML Sanitisation And Manipulation)

For a while now, I’ve been keen to build a HTML Sanitisation solution for PHP. Where else would I end up putting it other than in Zend Framework? As I’ve explored in past articles [1] [2], HTML Sanitisation in PHP is a very inconsistent practice. Sanitisers like HTMLPurifier are very secure out of the box but undeniably slow and resource intensive while others based on regular expression powered HTML parsing are much faster but tend to lose out a lot in the security stakes. Isn’t it possible to create a sanitiser that is both secure by default and performs well?

This was the core of the idea that became Wibble, my prototype for Zend\Html\Filter. Wibble borrowed sanitisation routines from a few programming languages to ensure secure operation, but relied entirely on PHP DOM and HTML Tidy for speed and HTML parsing. The resulting prototype was benchmarked [1] which proved that while Wibble could be faster than even regular expression based sanitisers (in scenarios where HTML was being manipulated) it most definitely would be faster than HTMLPurifier - without sacrificing security. Thus Wibble is capable of the best of both worlds - security and performance. The existing tradeoff in current solutions no longer applies.

You may read and comment on the proposal here: The proposal is up for review for Zend Framework 2.0.