<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">
<channel>
    <title>Maugrim The Reaper's Blog (Entries tagged as benchmark)</title>
    <link>http://blog.astrumfutura.com/</link>
    <description>Pádraic Brady on PHP, PHP Game Development and More</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.1 - http://www.s9y.org/</generator>
    <pubDate>Wed, 14 Jul 2010 22:31:41 GMT</pubDate>

    <image>
        <url>http://blog.astrumfutura.com/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: Maugrim The Reaper's Blog - Pádraic Brady on PHP, PHP Game Development and More</title>
        <link>http://blog.astrumfutura.com/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>HTML Sanitisation Benchmarking With Wibble (ZF Proposal)</title>
    <link>http://blog.astrumfutura.com/archives/430-HTML-Sanitisation-Benchmarking-With-Wibble-ZF-Proposal.html</link>
            <category>PHP General</category>
            <category>PHP Security</category>
            <category>Zend Framework</category>
    
    <comments>http://blog.astrumfutura.com/archives/430-HTML-Sanitisation-Benchmarking-With-Wibble-ZF-Proposal.html#comments</comments>
    <wfw:comment>http://blog.astrumfutura.com/wfwcomment.php?cid=430</wfw:comment>

    <slash:comments>27</slash:comments>
    <wfw:commentRss>http://blog.astrumfutura.com/rss.php?version=2.0&amp;type=comments&amp;cid=430</wfw:commentRss>
    

    <author>nospam@example.com (Pádraic Brady)</author>
    <content:encoded>
    In January of this year, I had the idea of writing a HTML Sanitiser for PHP. Why not? All PHP has is HTMLPurifier and a bunch of random solutions that are about as secure as the average wooden gate. If you think that&#039;s harsh, wait for my next blog post &lt;img src=&quot;http://blog.astrumfutura.com/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;. HTMLPurifier is the only secure by default HTML Sanitiser in PHP. Fact. But the darn thing is gigantic and slow. That has never stopped me using it (for years), even if I had to do a little funky engineering so I could minimise the performance hit. Other developers, however, have often abandoned HTMLPurifier, falling into the trap of believing that alternative solutions will serve them just as well.&lt;br /&gt;
&lt;br /&gt;
That&#039;s the state of HTML Sanitisation in PHP - pick a big slow library that crushes Cross-Site Scripting and Phishing attacks, or use yet another regular expression based sanitiser that a) barely manages a fraction of HTMLPurifier&#039;s features and b) can probably be exploited by any scriptkiddie working with a stack of data cards. It says an awful lot about security standards among PHP developers that such delusions are uncomprehendingly rampant.&lt;br /&gt;
&lt;br /&gt;
In case you haven&#039;t noticed, I&#039;m biased. Sue me.&lt;br /&gt;
&lt;br /&gt;
I have opined since forever that regular expression sanitisers are nothing short of insane. Since the problem with HTMLPurifier is speed and size, I started thinking about ways to build something like HTMLPurifier that was fast, small and almost as feature packed as HTMLPurifier. At first, this sounds like an impossible task. The typical suggestion is to use regular expressions, but I&#039;m not completely insane...yet. Instead I borrowed a concept called a DOM Filter and chucked in a helpful dose of HTML Tidy. The result was &lt;a href=&quot;http://github.com/padraic/wibble&quot;&gt;Wibble&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Wibble is basically a DOM Filter. It loads up HTML into PHP DOM, applies a set of filters against all nodes in the DOM, passes the output through HTML Tidy, and then hands it back to the user - sanitised and well-formed. It&#039;s almost stupid in its obviousness. Better, this allows Wibble to skip regular expression dependence. It operates far more like HTMLPurifier by relying on a DOM representation (no string parsing to funk around with) partnered with Tidy for cleanup.&lt;br /&gt;
&lt;br /&gt;
Of course, there have to be regular expressions somewhere. And whitelists. And other stuff. Wibble is really an amalgamation of borrowed concepts. It&#039;s hard to be too original in HTML Sanitisation because originality is a good way to shoot yourself in the foot (hence regex is EVIL!), so I wasn&#039;t going to spend too long digging my own grave when there is a wealth of sanitisation resources in the programming world. Wibble&#039;s approach borrows elements from Ruby&#039;s loofah, Python&#039;s HTML5Lib, and Java&#039;s AntiSamy. Wibble mixes and matches from the useful design elements each of these offers, serving them up on top of PHP&#039;s DOM and Tidy extensions with its own distinctive twists.&lt;br /&gt;
&lt;br /&gt;
I completed the first Wibble prototype recently, so I figured that with something that was at that 90% point where the remaining 10% would be in-depth sanity testing, cleanup and documentation, it was time to see how it compared to some other PHP solutions (&lt;a href=&quot;http://www.htmlpurifier.org&quot;&gt;HTMLPurifier&lt;/a&gt; and &lt;a href=&quot;http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/&quot; &gt;HtmLawed&lt;/a&gt;). I had some fairly conservative performance objectives so the results came as a pleasant surprise.&lt;br /&gt;
&lt;br /&gt;
If you are a benchmark fiend, you can download and independently fiddle with my benchmark process from &lt;a href=&quot;http://github.com/padraic/wibble-benchmarks&quot;&gt;http://github.com/padraic/wibble-benchmarks&lt;/a&gt;. Note that the current benchmark uses a Wibble prototype - there are additional elements that need to be added over time. The benchmark currently uses three sample snippets of HTML: Small (blog comment size), Medium (markup heavy with limited textual content), and  Big (markup light with lots of textual content). It operates by filtering each HTML sample 200 times with each benchmarked HTML sanitisation solution. Each iteration includes the instantiation and setup phases of each solution (where relevant) to reflect the most likely real world experience of using sanitisation as a once off (non-repeating in same request) process. I use PEAR&#039;s Benchmark package to record the aggregate run time per loop of sanitisation tasks. All operations occur within one single PHP process with HTMLPurifier caching enabled (Wibble and HtmLawed do not use caching). Each solution is configured as close as possible to target total stripping of all HTML from the content.&lt;br /&gt;
&lt;br /&gt;
You can view a sample result at &lt;a href=&quot;http://gist.github.com/468426&quot;&gt;http://gist.github.com/468426&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The results show that both Wibble and HtmLawed outperform HTMLPurifier by a very wide margin. Wibble underperforms HtmLawed by a variable margin - from twice as slow on small to medium sized input, to four times slower on large inputs with minimal HTML tags. In Wibble&#039;s slowest benchmark, it outperformed HTMLPurifier by a factor of four.&lt;br /&gt;
&lt;br /&gt;
Wibble intent is to try and replicate the completeness of HTMLPurifier, so it&#039;s speed deficit when compared to HtmLawed is expected (when stripping all tags). There is not a lot to be done to improve this specific benchmark result since Wibble does a lot of stuff behind the scenes like encoding normalisation, DOM manipulation and HTML tidying. It also does all three of these things far more consistently and completely than HtmLawed is capable of.&lt;br /&gt;
&lt;br /&gt;
So how does Wibble match up against Big Daddy? Wibble is a prototype, so obviously it still has ground to gain in terms of features with HTMLPurifier. But on the most significant points it only has one specific problem - it&#039;s not HTML 5 ready. Neither DOM or Tidy support HTML 5, though you can &quot;pretend&quot; it&#039;s HTML 4.01 (or even XHTML 1.0) for HTML 5 fragments so long as you are aware Tidy will strip unsupported HTML 5 tags and attributes.&lt;br /&gt;
&lt;br /&gt;
The other points are syncing up with HTMLPurifier quite nicely. Wibble will santitise all HTML by default using strict filters (i.e. by default it strips every tag and only outputs plain text). It handles multiple encodings including conversion if necessary. It outputs standards compliant (other than HTML 5) HTML or XHTML. It fixes all the usual page breaking stuff like unclosed tags and illegal tag nesting. It is entirely reliant on whitelists and strict validation rather than blacklists and loose reconstructive parsing. It includes minimal regular expression usage (only needed for attribute and CSS validation) based on regular expressions widely used and tested in other languages. While testing will (and must) continue, it has so far proven resistant to XSS and Phishing attacks. This can&#039;t be absolutely assured until sufficient testing has been performed.&lt;br /&gt;
&lt;br /&gt;
Otherwise, it will be interesting to see the final version of Wibble. HTMLPurifier has a tough reputation to follow, but having something which can even up the odds and do it with a pronounced advantage in speed will be really nice. Well, until someone needs to install it on CentOS &lt;img src=&quot;http://blog.astrumfutura.com/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;.  
    </content:encoded>
    <dc:creator>P&#225;draic Brady</dc:creator>

    <pubDate>Thu, 08 Jul 2010 20:50:31 +0000</pubDate>
    <guid isPermaLink="false">http://blog.astrumfutura.com/archives/430-guid.html</guid>
    <category>benchmark</category>
<category>php general</category>
<category>php security</category>
<category>wibble</category>
<category>xss</category>
<category>zend framework</category>
<category>zf proposal</category>
<creativeCommons:license>http://creativecommons.org/licenses/by/1.0/</creativeCommons:license>
</item>
<item>
    <title>PHP Framework Benchmarks: Entertaining But Ultimately Useless</title>
    <link>http://blog.astrumfutura.com/archives/421-PHP-Framework-Benchmarks-Entertaining-But-Ultimately-Useless.html</link>
            <category>PHP General</category>
            <category>PHP Security</category>
            <category>Zend Framework</category>
    
    <comments>http://blog.astrumfutura.com/archives/421-PHP-Framework-Benchmarks-Entertaining-But-Ultimately-Useless.html#comments</comments>
    <wfw:comment>http://blog.astrumfutura.com/wfwcomment.php?cid=421</wfw:comment>

    <slash:comments>49</slash:comments>
    <wfw:commentRss>http://blog.astrumfutura.com/rss.php?version=2.0&amp;type=comments&amp;cid=421</wfw:commentRss>
    

    <author>nospam@example.com (Pádraic Brady)</author>
    <content:encoded>
    Some recent attention in the PHP framework community has been focused on &lt;a href=&quot;http://symfony-reloaded.org/fast&quot;&gt;the recent publication of Symfony 2 Preview benchmarks&lt;/a&gt; showing that Symfony 2 outperforms Zend Framework by a factor of 3.5. It also outperforms every other benchmarked framework. This reminded me of the &lt;a href=&quot;http://www.yiiframework.com/performance/&quot;&gt;earlier Yii Framework benchmarks&lt;/a&gt; which allegedly proved that Yii outperforms Zend Framework by a factor of 7. At this point, the spirit of Glenn Beck would have me demonstrate that 3.5 multiplied by 2 (the number of eyes in Barrack Obama&#039;s skull) equals 7, thus proving the existence of a Liberal conspiracy led by a closet Muslim. That&#039;s probably bullshit though.&lt;br /&gt;
&lt;br /&gt;
My fellow Zend Frameworkers, we cannot allow this to stand. We must put paid to these wild claims and prove, once and for all, that Zend Framework is the fastest. To present this undeniable truth rooted in solid statistics and Paul M. Jones (aka Benchmark God) inspired techniques, I have created the benchmark of benchmarks. Well, to be honest, I only really edited &lt;a href=&quot;http://github.com/fabpot/framework-benchs&quot;&gt;another benchmark&lt;/a&gt;. But still, it will prove Zend Framework is faster than everything else out there.&lt;br /&gt;
&lt;br /&gt;
Before everyone gets too excited, the Symfony 2 Preview performance is quite impressive and it&#039;s consistent in my own rerun of the original benchmark. A lot of effort has obviously gone into speeding things up and it shows. My benchmark is not about busting some untruth, but about demonstrating how it is possible to outrace Symfony 2 and Yii with the Zend Framework. I&#039;ll dissect the methods after the results are reported &lt;img src=&quot;http://blog.astrumfutura.com/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;.&lt;br /&gt;
&lt;br /&gt;
To ensure the unchallenged fairness of my benchmark, I will &lt;a href=&quot;http://github.com/padraic/framework-benchs&quot;&gt;publish its entire source code to Github&lt;/a&gt;. Unless there are any more sarcastic Irishmen out there, that should be relatively safe. I will also follow in the path of the Almighty Paul M. Jones, in whose shadow all PHP framework benchmarkers will find peace, flowing waters, milk and honey and, if not too much trouble, the extermination of all console noobs who insist that PC gaming is dying. Is that too much to ask, Paul? A little biblical gnashing of teeth would do them good. Paul (may he live forever) promotes using a standardised benchmarking platform, so I will be using an AWS instance &lt;a href=&quot;http://github.com/padraic/framework-benchs/blob/master/replicating.markdown&quot;&gt;reflecting Fabien Potencier&#039;s&lt;/a&gt; except it will be an m1.large instance instead (the other seeing a few issues in testing) and include some baseline HTML to make sure siege or ab are not misbehaving.&lt;br /&gt;
&lt;br /&gt;
With all these preparations, the stage is set to demolish Symfony 2 and Yii and run them into the ground. Prior to running the benchmark, it&#039;s suggested to play the Battlestar Galactica theme tune in the background to set the right mood. Just think of Fabien Potencier dressed up as a Cylon (if only my Photoshop skills were better...sigh). It fits in well with the siege feedback. You can try some HipHop, but those Facebook guys must have been smoking something illegal because it had no impact during the Zend Framework runs. I also experimented with Metallica (surely that would work!) but no...nothing. Quite disappointing.&lt;br /&gt;
&lt;br /&gt;
For those with attention deficit disorder, here are the results:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;framework                |      rel |      avg |        1 |        2 |        3 |        4 |        5&lt;br /&gt;
------------------------ | -------- | -------- | -------- | -------- | -------- | -------- | --------&lt;br /&gt;
baseline-html            |   1.1972 |  5702.79 |  5527.29 |  5775.74 |  5779.59 |  5671.24 |  5760.10&lt;br /&gt;
baseline-php             |   1.0000 |  4763.50 |  4718.24 |  4751.51 |  4760.75 |  4763.68 |  4823.32&lt;br /&gt;
zend-1.10                |   0.1596 |   760.45 |   766.90 |   764.70 |   758.62 |   752.04 |   759.98&lt;br /&gt;
symfony-2.0.0alpha1      |   0.1366 |   650.61 |   641.98 |   655.41 |   653.86 |   656.68 |   645.12&lt;br /&gt;
solar-1.0.0beta3         |   0.1131 |   538.86 |   539.76 |   536.95 |   540.63 |   540.10 |   536.87&lt;br /&gt;
yii-1.1.1                |   0.0821 |   390.87 |   401.90 |   392.59 |   386.18 |   395.98 |   377.72&lt;br /&gt;
symfony-1.4.2            |   0.0441 |   210.22 |   211.20 |   209.78 |   210.72 |   210.49 |   208.92&lt;br /&gt;
cakephp-1.2.6            |   0.0406 |   193.56 |   193.84 |   193.35 |   193.27 |   192.57 |   194.75&lt;br /&gt;
&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
As you can clearly see, Zend Framework is faster than all other frameworks. More enlightening, we beat both Symfony 2 and Yii. Lithium and Flow were omitted from the original benchmark because besides them being non-functional when forked from Fabien&#039;s benchmark (for whatever reason), they are both fairly young frameworks (Flow3 is alpha). Note: Lithium works fine outside of the benchmark (I&#039;ve been working with it) and it is fairly fast.&lt;br /&gt;
&lt;br /&gt;
Zend Framework wins. You can go now.&lt;br /&gt;
&lt;br /&gt;
Still here? I suppose you&#039;re trying to figure out how I manipulated...er...improved the benchmark to reflect my reality of Zend Framework being superior at all levels. Well, I&#039;m not sure I should tell you. It&#039;s hard enough to create and run these benchmarks without having to explain how to fuck them over so they show whatever you need them to. Alright then, I&#039;ll &#039;fess up.&lt;br /&gt;
&lt;br /&gt;
To create a positive benchmark, you need to understand that all frameworks were born as festering piles of unoptimised stinking crap. They were all born bad and get worse with age. This sounds quite sad, but actually it&#039;s an inevitable compromise between performance and features. It&#039;s also a compromise between performance and ease-of-use. So you see, performance is unfairly faced by two opponents: features and ease-of-use. All performance is sacrificed in the name of serving the needs of rapid development, flexibility, prototyping, and making your source code look prettier than the other guy&#039;s. As if.&lt;br /&gt;
&lt;br /&gt;
What happens if you move away from the enemies of performance and do some propping up behind the scenes? You get...wait for it...oodles of extra performance!&lt;br /&gt;
&lt;br /&gt;
This is partly how both Symfony 2 and Yii manage to outperform Zend Framework. But hey, we can do it too!&lt;br /&gt;
&lt;br /&gt;
To ascertain the perfect method of &quot;propping up&quot; to get the Zend Framework into speed demon territory without it costing me dozens of hours of in-depth analysis, I created four variants to play with: All-In, Optimised, More-Optimised and What-The-Fuck-Optimised. Here&#039;s the four final alternatives benchmarked against themselves (same run as the previous results).&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;framework                |      rel |      avg |        1 |        2 |        3 |        4 |        5&lt;br /&gt;
------------------------ | -------- | -------- | -------- | -------- | -------- | -------- | --------&lt;br /&gt;
baseline-html            |   1.1972 |  5702.79 |  5527.29 |  5775.74 |  5779.59 |  5671.24 |  5760.10&lt;br /&gt;
baseline-php             |   1.0000 |  4763.50 |  4718.24 |  4751.51 |  4760.75 |  4763.68 |  4823.32&lt;br /&gt;
zend-1.10-wtfoptimised   |   0.1748 |   832.63 |   830.64 |   837.20 |   836.20 |   824.90 |   834.23&lt;br /&gt;
zend-1.10-moreoptimised  |   0.1596 |   760.45 |   766.90 |   764.70 |   758.62 |   752.04 |   759.98&lt;br /&gt;
zend-1.10-optimised      |   0.0586 |   279.25 |   280.71 |   279.30 |   279.62 |   276.51 |   280.10&lt;br /&gt;
zend-1.10                |   0.0292 |   138.89 |   138.43 |   137.56 |   139.96 |   139.93 |   138.56&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
Pretty illuminating &lt;img src=&quot;http://blog.astrumfutura.com/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;. Yes, the What-The-Fuck-Optimised variant is faster than the All-In variant by a factor of 6. The More-Optimised variant (fairer for reasons explained below) is 5.5 times faster than the base variant. In fact, the improvements are fairly smooth with each level of optimisation, at least doubling each time.&lt;br /&gt;
&lt;br /&gt;
So just how does this miraculous optimisation work to such a degree that we can beat Symfony 2 and Yii at their speed games? There are four solid steps to being a speed demon. I only implemented these in the More-Optimised variant which was enough to destroy Symfony 2 (as configured) and Yii in the benchmarks. The Optimised version simply discarded some optional ease-of-use features. What-The-Fuck-Optimised is identical to More-Optimised except it exits in the middle of a controller (we only had to echo Hello, right? &lt;img src=&quot;http://blog.astrumfutura.com/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;).&lt;br /&gt;
&lt;br /&gt;
1. Don&#039;t use Zend_Application. While Zend_App is great for creating consistent complex bootstraps within a standardised structure, it doesn&#039;t come without a significant performance hit to baseline performance. A more direct bootstrap (typical of ZF until Zend_App arrived) is far faster and can also be done without configuration files.&lt;br /&gt;
&lt;br /&gt;
2. Skip using the ViewRenderer plugin. Without the ViewRenderer, you need to manually configure Zend_View and add render() calls to Controllers. This is actually very simple to do and is fairly fast - fast was never really part of the ViewRenderer&#039;s genetics.&lt;br /&gt;
&lt;br /&gt;
3. Use autoloading. Strip require_once calls from the framework library so unneeded files are ignored. Replace uses of Zend_Loader_Autoloader with a not-so-crazy autoloader function. In fact, pray Zend_Loader is never used - it does a lot of file ops that, to date, have never been explained to me as having any value.&lt;br /&gt;
&lt;br /&gt;
4. Preload everything (Symfony 2 Preview does!). It buys you some performance cookies and equalises the speed baseline. Using a simple preload script is not that hard.&lt;br /&gt;
&lt;br /&gt;
With these four techniques, Symfony 2 and Yii are left in the dust.&lt;br /&gt;
&lt;br /&gt;
It&#039;s time to ask the question on everybody&#039;s minds: what does it all mean? Give up. There is no meaning in benchmarks. They are designed to compare the relative performance of frameworks with different goals, design philosophies, development practices, and features. Might as well flip a coin, roll some dice or perhaps examine the entrails of a sacrificed goat for answers. The goat is your best bet. If all else fails at least you can have a nice goat-stew dinner while you think things over some more.&lt;br /&gt;
&lt;br /&gt;
Benchmarks. Useless. Final words?&lt;br /&gt;
&lt;br /&gt;
Know your framework! All this benchmarking nonsense does little good unless it&#039;s plastered with disclaimers. Symfony&#039;s preloading is an obvious example mainly because it&#039;s the only framework in Fabien&#039;s benchmark (that I know of) using it. Does it increase base performance out of the box? Yes. Can every other framework use it with little effort? Yes. So what&#039;s the point? Should everything else preload by default? Maybe, who can tell. Most developers simply don&#039;t work in scenarios where such a performance boost would make a difference. And if they do, it&#039;s easy to make the necessary changes.&lt;br /&gt;
&lt;br /&gt;
What&#039;s interesting is that I just fiddled with the Zend Framework. You could probably find ways of making every other framework in there run a hell of a lot faster than currently presented. It&#039;s just a bit sweeter for Zend Framework since we&#039;re the self-proclaimed use-at-will framework. And its really fast when developers need it to be.  
    </content:encoded>
    <dc:creator>P&#225;draic Brady</dc:creator>

    <pubDate>Tue, 23 Feb 2010 16:27:11 +0000</pubDate>
    <guid isPermaLink="false">http://blog.astrumfutura.com/archives/421-guid.html</guid>
    <category>benchmark</category>
<category>php general</category>
<category>php security</category>
<category>zend framework</category>
<creativeCommons:license>http://creativecommons.org/licenses/by/1.0/</creativeCommons:license>
</item>

</channel>
</rss>