PHP, Zend Framework and Other Crazy Stuff
PHP Security
Mockery 0.7.2 Released (And On Packagist.org!)
Jan 25th
Mockery is a simple yet flexible PHP mock object framework for use in unit testing with PHPUnit, PHPSpec or any other testing framework. Its core goal is to offer a framework for creating test doubles like mock objects through the use of a simple and succint API capable of clearly defining all possible object operations and interactions using a human readable Domain Specific Language (DSL). Designed as a drop in alternative to PHPUnit’s phpunit-mock-objects library, Mockery is easy to integrate with PHPUnit and can happily operate alongside phpunit-mock-objects.
Today, I am pleased to announce the release of Mockery 0.7.2, a maintenance release fixing a small number of bugs and annoyances. A special thanks to all those who forked the Github project at and submitted pull requests! Leaving a developer with hardly any work to do other than a quick test and merge is always appreciated! You can install or upgrade to the new version from the survivethedeepend.com PEAR channel.
Another piece of news is that Mockery is now available on Packagist.org for users of Composer. Composer is a tool to help you manage your own projects’ or librarys’ dependencies and it can handle and mix dependencies from Composer compatible repositories like Packagist.org, any git repository using tags, and any PEAR channel. I do this of my own free will and not because Luis Cordova and Benjamin Eberlei are standing behind me with pitchforks
.
The more pertinant fixes include:
- Fixed a problem in resolving methods chains which abuse the Law of Demeter (thanks to the wizardly Robert Basic).
- Fixed unexpected static calls to an alias mock which were causing fatal errors (thanks to Luis Cordova).
- Fixed a crash present since PHP 5.3.6 due to a referenced $this variable entering a closure (thanks to Martin Sadovy).
- Added support for PHP_CodeCoverage 1.1 whose filter class is no longer a singleton (thanks to Matthew Vivian).
- Added non-halting exception handling (for Mockery exceptions) to the PHPUnit TestListener (thanks to Adrian Slade).
- Added boolean $prepend (defaults to FALSE) parameter to \Mockery\Loader::register() to allow for registering Mockery’s autoloader to the top of the autoloader stack even after other autoloaders have been registered (thanks to Hermann Kosselowski).
- Updated documentation/tests for the release of Hamcrest 1.0.0 several days ago (thanks to me, me, me – who finally got to do something nobody else had a pull request for!).
- Added new \Mockery::self() static method to make retrieving the current mock object simpler and more readable while setting expectations without the need to refer back to past variable assignments.
Users should also note that Hamcrest 1.0.0, which includes a small filename change (hamcrest.php was capitalised to Hamcrest.php), was released several days ago. If you use Hamcrest matchers with Mockery, you should ensure that both libraries are updated on your system.
As always, please report any bugs or potential improvements to the Github issue tracker using the relevant label or, even more appreciated, send me a pull request.
Storing Session Data In Cookies: Problems And Security Concerns To Be Aware Of
Jan 23rd
Back from my extended leave of absence, I’ll re-open the dusty cobwebbed depths of this blog to echo the sentiments of Paul Reinheimer in his recent article “Cookies don’t replace Sessions“. The topic is actually an old one since Ruby On Rails has adopted the strategy of storing application session data in cookies by default (take note, performance hounds). The purposes of storing sessions in userland cookies rather than the conventional “stick-it-on-the-filesystem/database” used by many apps is one of performance and a little obscuration. Cookie data can be accessed faster than hitting the filesystem/database plus it has the dubious ability to disguise the session-targeted programming language. Really though, PHP is assumed to be on all web servers so hiding its existence is a bit like trying to hide an elephant in a zoo. Hide it all you want – we still know there has to be one in there!
In exchange for speeding up session reading, storing session data in cookies has some fairly uncomfortable costs.
Now, developers are not unaware of the problems of storing potentially sensitive application data in plain text files on the user’s PC which users can manipulate, copy, and mangle to their (or the hacker’s currently fiddling with the user’s PC) heart’s content. It’s dangerous depending on just how much you rely on session data to drive other security rules or restrictions on business logic within the application. Technically, the reliance placed on sessions should be close to nothing – session data should drive the application towards other storage solutions for the really essential stuff and just stay around as a minimal identifier/stash of basic ID info. Such minimal information can be dumped, corrupted, or overwritten with the only cost being to perhaps require a user to login again when that happens. Stuffing a bank balance into a session, on the other hand, is one (very exaggerated!) example of the kind of data you should be shot for relying on a session for.
Programmers being programmers, it’s not rare to see sessions become a more intrinsically important storage location than it should be. In those cases, being able to manipulate the session data can become a problem and may give rise to exploitation scenarios where tampering with the stored data leads to some benefit for the manipulator. Obviously we want to make sure that that can’t happen even in scenarios where programmers may be a bit loose with where they store data. We don’t build frameworks and libraries for Gurus, we build them for all programmers – even the sometimes ignorant and under trained ones. This cookie stored session data is often coupled with the ability to encrypt that data. However…
As Paul Rainheimer remarks in his article, “Encryption is often viewed as a panacea for security problems, you sprinkle a little encryption dust around, and your problems dissolve”. This is an absolute truth in programming – programmers often view encryption as a solution without regard for one teeny tiny problem. If you encrypt a set of data for any purpose, even though it’s encrypted, the user (or the hacker hacking the user’s account) still has the data in some usable form!
With perfectly intact data, and even through it’s hidden by encryption, that data can be recycled simply by copying it to another machine. Depending on the data that is stored (which admittedly may require the hacker/user to figure out by doing actual work like finding your open source app on Github or breaking a developer’s fingers until they spill the beans), you can restore past data just by copying over a backup of a prior cookie or repeat a past transaction by continually reusing the original cookie it required. Paul offers a few trivial examples in his article.
Such reuse of data is known as a replay attack. A scenario where even encrypted data can be constantly reused to give rise to a positive result – all without any need whatsoever to break the encryption. The antidote to this vulnerability is to ensure that all data sets are unique and can be used only once, i.e. you include a single-use nonce (some generated set of characters or bits) in the data which is updated whenever that data is used. This continually forces the update of the relevant digital HMAC signature and/or encryption result (even for the exact same data otherwise) in order to prevent any reuse of old data in a replay attack. Once a nonce is used, it’s discarded, and the old data can no longer be accepted by your application. Of course, the downside is that since the nonce must be single-use, you need to keep track of all nonces to ensure they are not accidentally used again. You will need a database, possibly using a nonce-included timestamp as a time limit so your storage requirements aren’t completed insane, which obviously means that just using the traditional database storage for sessions in the first place would have been a much better and simpler choice.
So, in summary, encryption prevents the reading of data but it does not prevent the reuse of existing data. For that to be prevented you need a nonce implementation. And, due to the complexity of using and tracking nonces, practically no cookie stored session solutions will actually offer nonce support because it would eliminate their speed advantage. Which means they are susceptible to replay attacks, which means they are dangerous tools to be swinging around blindly, which means that the old local session storage strategies are still far superior from a security perspective, which all means that you should avoid cookie stores like the damned plague and stick to the old, traditional but secure session storage strategies we already have unless you a) are crazy or b) trust your colleagues (and yourself) not to screw it up.
Even without the security concerns, there is also another less critical downside to storing sessions in cookies which is that cookies have a storage limit of around 4KB. No other storage solution for session data should have that problem but you need to be aware of it anyway as using encryption may push you there sooner than the base data size might suggest (encrypted data size is usually larger than the original data). While noting this, you should never really hit that limit unless you are storing data there that you likely shouldn’t be anyway!
So, cookie based session storage: It’s very fast but lethally insecure if you store the wrong type of data. If you’re going to use it, make sure you keep a tight rein on what data is being stored.
Zend Framework 2.0: Dependency Injection (Part 1)
Oct 4th
If you’ve been watching the PHP weather vane (we call it Twitter for short), you may have noticed a shift in Symfony and Zend Framework. Version 2.0 of both web application frameworks feature Dependency Injection Containers (DICs) as the primary means of creating the objects (and even Controllers) your application will use. This is an interesting shift in a programming language that often stubbornly evaded adopting DICs to any great extent. In this mini-series of articles, I’ll take a look at the marvellous world of Dependency Injection as we run up to an examination of Zend Framework 2.0′s Zend\Di component in the next part.
What is Dependency Injection (DI)?
The short answer to this question is that Dependency Injection is a design pattern where, instead of dependent objects creating their dependencies internally, they instead define setters, constructor parameters or public properties which allow a user to “inject” dependencies from the outside into the dependent object and where such dependencies adhere to an expected interface.
If the definition sounds familiar, it’s because Dependency Injection is an obvious design pattern. As a programmer who knows how to use PHPUnit, you probably use the pattern every time you open an editor. So let’s quickly look at why the pattern is both obvious and ubiquitous.
Imagine a class implementation called Leprechaun. In writing the class, we realise we have a dependency on another class called PotOfGold. A naïve implementation would start out very simply with the Leprechaun object creating an instance of PotOfGold for use.
If you think this through, you may notice the problems. What if we want our Leprechaun to instead have a PotOfRareEarthElementsFromChina? What if we need to replace PotOfGold with a mock object during unit testing? What if another users locates a bug in PotOfGold and needs to replace it without editing the original class (since it’s under 3rd party version control)?
The answer to all these questions is to allow external parties to inject dependencies instead of relying on the object to create them internally. Based on our ridiculous example from above, we would define a setter called setPot(), and allow it to accept any object which implements a new Pot interface. Using an interface merely ensures the dependency that is set obeys some interface the dependent object is expecting.
That, in a nutshell, is why Dependency Injection is obvious. It’s a simple shuffling of creational responsibilities from within an object to some external agent which makes the dependent object more flexible, testable and amenable to the wisdom that Composition is preferred over Inheritance (i.e. injecting objects beats monkey patching!).
Some External Agent
In applying Dependency Injection, we eventually reach a state where all objects in a system are created by a mysterious external agent. What is this entity?
One possible candidate is whatever passes for a Controller in your framework based application. In Zend Framework, this would be an instance of Zend_Controller_Action. Our Controller, in this instance, would define an action method which would perform a necessary application task and create all the objects needed to perform that task. This makes a lot of immediate sense to programmers since allowing you to write Controllers with as little fuss as possible is a fundamental goal of any framework.
However, Controllers are objects! If you had a NewsletterController defining an emailAction method, you might expect that creating an instance of Zend_Mail inside that action is obvious (which it is). Think again! In Dependency Injection parlance, your Controller is a dependent object and an instance of Zend_Mail is one of its dependencies. This is no different from our Leprechaun example. If we create the Zend_Mail instance inside the Controller we get the same irritatingly stubborn question. How do we replace the Zend_Mail instance with an alternative, test double or monkey patched version containing an emergency bug fix?
Controllers, alas, are not the external agent we’re looking for to create objects. And yes, you really should be testing your Controllers
.
The next entity a level above Controllers can be loosely termed the Bootstrap. In Zend Framework 1, this started out as a relatively simple script to do just enough that you could start the FrontController and dispatch a request. In other words, Zend Framework traditionally did not offer a final external agent as needed for Dependency Injection. It left it to individual users to create something of their own or, as became inevitable, to just create objects in the Controllers themselves.
More recent Zend Framework versions offer Zend_Application, a method of bootstrapping that allowed users to define Resources, i.e. using a method or class which created an object (and injected its dependencies) and returned it on demand when it was needed by a Controller. This was the first consistent approach to handling object creation in ZF which effectively involved defining any number of Factory classes or methods in one location and passing the managing object (the Bootstrap) around the application wherever specific objects needed to be retrieved. In effect, this was a Dependency Injection Container. So, surprise, users of Zend Framework already have a DIC. An even lesser surprise: Zend Framework 2.0 will be no different.
Dependency Injection Containers Are The Devil
The concept of a Dependency Injection Container (DIC) is to act as a programmable object assembler. You take your DIC, tell it how to construct objects (including how to construct and inject their dependencies), pass the DIC to wherever it’s needed, and eventually ask it to create an object it knows about. This is not rocket science. DICs are simple animals to understand, however the devilish suspicion that PHP developers have for DICs is not rooted in what they do but how they do it and whether they make a developer’s life easier.
There’s a widely known belief that the Ruby language doesn’t need a DIC. I’ll use Ruby as an example because it has a few features PHP programmers can salivate over (like how it uses a new method for classes vs PHP’s new keyword making class subsitutions stupidly easy). One investigator of Dependency Injection from the Ruby world is Jamis Buck. For Ruby he wrote two DICs: Copland (a port of Java’s HiveMind) and Needle (it’s like Pimple on steroids which…defeats the purpose). After fighting Ruby for a few years, he finally gave up on trying to write a Ruby DIC and documented his thoughts on his blog in “LEGOs, Play-Doh, and Programming“.
The core lesson from the article holds true even in PHP – by and large, complex DICs are a complete waste of time in most scenarios. Indeed, if you ever use a DIC and discover it requires just as many (if not more) lines of DIC code and configuration as it would to do the same thing in plain old PHP, you should start asking where the fabulous benefits have vanished to because it’s not delaying the onset of cramped finger muscles as advertised.
Most PHP developers understand this instinctively. Unlike Jamis, most PHP programmers probably won’t have a strong Java background. As a programming group, we’re less inclined to assume we need a special DIC blessed by the PHP Gods so we fall back to whatever strikes us as a simpler solution.
But here’s the rub – the simplest solution is itself a DIC.
In referring to Dependency Injection Containers as the devil, cursing their name, and blaming them as Java imports designed to make life more complex than needed, it’s easy to lose sight of the fact that such criticism is about the implementation of DICs and not their actual function. There is nothing wrong with having object assemblers – we use them all the time and call them Service Locators, or Factory Classes, or Zend_Application (Resources), or any of a dozen terms slightly different and probably not entirely accurate. Most of the time we’re trying to create a DIC without being aware of the term.
Needles and Pimples (It’s Not What You Imagine)
Jamis Buck hit the nail on the head back in 2004 with the creation of his Needle DIC Ruby. Instead of creating something inspired by Java that relied on static configuration and too many features, he realised that Ruby excelled (as does PHP to a growing degree – thank Closures) in expressing logic through a Domain Specific Language (DSL). The result was a DIC captured by a simple DSL – well, until he went and overcomplicated it (read his article).
You can see the exact same fundamental simplicity that a DIC is capable of in PHP. It’s a small so-tiny-you-won’t-believe-it DIC called Pimple. Try calling that complex, hard, stupid or any other adjective you might instinctively think of when faced with the term “Dependency Injection”.
The core of Pimple is that you define object creations as closures. This immediately resolves a few traditional DIC problems. There’s no static configuration, you hand code all creation logic exactly once, and objects are named services you can recall and inject into other objects from your closure bodies. It basically takes everything you’d do in creating objects by hand and captures it all in one container. Other than the fact I hate arrays (my version uses object properties instead – it’s 50 lines; nobody was killed during its 5 minute development period), Pimple is like Dependency Injection itself – so blindingly obvious you may kick yourself.
Pimple proves that DICs are not the devil – they can be incredibly simple and useful tools if you can tame the urge to complicate it’s implementation.
Then There Were Frameworks
As you can probably see, making a strong case for DICs is not hard. Dependency Injection is obvious and omnipresent in PHP. Dependency Injection Containers can be a simple 50 line class you can write over a coffee break. The going gets tough when the simple notions we desperately want to cling to meet the complexity of PHP’s now standard tool: the application framework.
Frameworks: Not Written By Monkeys
As we’ve already covered, Zend Framework 1.0 covered off the external agent problem in Dependency Injection by creating Zend_Application. As Zend Framework 2.0 moves towards beta, it also needs a Dependency Injection Container to do similar heavy lifting. This time around, we called a spade a spade and the O’Phinney/Schindler hive mind wrote Zend\Di\DependencyInjector.
The DICs used by Symfony and Zend Framework are not like Pimple. Symfony’s DIC is driven by static configuration (preferably YAML for brevity). Zend Framework 2.0′s DIC is driven by a PHP API (no static configuration). Both have their own set of performance boosting measures to minimise any overhead in using a more complex DIC.
In the next part this mini series, we’ll take a deeper look at Zend\Di and see how it fares compared to Pimple or Symfony 2. In the meantime, I hope I’ve busted a few apprehensions you might have about using a DIC
.
Zend Framework Contributors Mailing-List Summary; Edition #2 (July 2011)
Aug 24th
It’s been a busy month in Zend Framework land which I’ll blog about shortly so, after a few weeks of delay, here’s the July 2011 Summary of the zf-contributor’s mailing list.
ZF2 Feedback
Late June kicked off with this topic from Robert Basic with a set of notes on his experiences in getting started with ZF2 by migrating a ZF1 application. Adam Lundrigan noted, correctly, that a lot of “bleeding edge” code is not included in the main repository at this time and is distributed across contributor Github forks. He also raised the suggestion for a ZF2 Status Page. Derek Miranda voiced his agreement with Adam. Robert also agreed noting the difficulty in assessing the state of components.
Summary: ZF2 is scattered across multiple forks – be prepared to rely on notes such as Robert’s if jumping in at the deep end.
Creating a 1.11.9 Hotfix Release
A short note from Matthew Weier O’Phinney announced that a 1.11.9 hotfix release would be made to fix a number of backwards compatibility breaks introduced in 1.11.8. Issue tickets involved were ZF-11548, ZF-11550, ZF-10991 and ZF-10725.
Summary: It’s a maintenance release. It fixes stuff.
Zend\Http and MVC Developments
Ralph Schindler presented a document outlining a requirement list and the overall architecture of classes and interfaces for Zend\Http, noting work would commence on a prototype once any outstanding items suggested were cleared. Rob Zienart commented that the document indicated interfaces for Zend\Http Client and Server components and mentioned they needed proposals. Matthew responded that Zend\Http’s Server would deal with classes extending Zend\Service\Abstract such as SOAP and AMF but would not comprise a HTTP Server given it was covered by PHP 5.4. Anthony Shireman asked whether there were any Zend\Http Server plans or whether it was a “time will tell” situation. Matthew confirmed that that was the case given PHP 5.4 would include a HTTP Server and ZF2 could piggy back that implementation in offering a development server environment.
Summary: HTTP work continues. We’ll need it to communicate with all those big tubes connecting PCs.
[Proposal] ActiveRecord Proposal
Artur Bodera raised the proposal and offered to implement an ActiveRecord solution noting its benefits compared to Zend\Db. The proposal was published at http://framework.zend.com/wiki/display/ZFDEV2/ActiveRecord+-+Arthur+Bodera with a working branch at https://github.com/Thinkscape/zf2/branches/ActiveRecord.
Nicolas Bérard-Nault asked why it was necessary to reinvent the wheel instead of integrating with other existing and mature implementations. Artur responded that other solutions did not integrate with Zend Framework noting his proposal is built on Zend\Db from ZF2 and he wondered what was the point of Zend\Db\Table otherwise in the face of Doctrine or Propel. Peter Kokx responded to note that Zend\Db\Table implements the Table Data and Row Data Gateway patterns as distinct from ActiveRecord and that users shouldn’t interpret MVC as referring solely to ActiveRecord. Artur conceded that this was a good point but pressed his point that ActiveRecord was one tool which did on impose on any others available to Zend Framework using Zend\Db. Tomáš Fejfar voiced his support for adding ActiveRecord noting its value in simple use cases to get things done fast.
Ralph Schindler leaped to the rescue by noting that ActiveRecord is indeed planned for ZF2 and noting the significant work done to date on Zend\Db in his own feature branch. Artur Bodera welcomed the progress stating he would migrate his ActiveRecord solution over to the improved Zend\Db once complete.
Summary: We’re getting an ActiveRecord implementation for ZF2.
ZF2 Docbook Sources Converted to DocBook 5
Another short note from Matthew Weier O’Phinney informed the community that ZF2′s docbook formatted manual files had been migrated to Docbook 5. The community silently admired the completion of this task (nobody responded but I assume they silently admired all the same!). Matthew noted the README for manual generation would be updated and that Docbook 5 made certain tasks a lot easier.
Summary: ZF2 Manual will be written in Docbook 5, those using a visual XML editor may celebrate.
ZF2 Zend\Mail: To strip/validate or not to strip/validate (email adresses)
Status of the Test Suite (ZF2)
Keith Pope asked after the status of the Test Suite mentioning that phpunit.xml was mostly commented out, Zend\Di was not using the @group annotation for the test runner, and TestConfiguration.php was nearing 800 lines. He suggested that the configuration be spread into a conf.d setup (i.e. each configuration segment split into a separate file and all combined at runtime). Matthew responded noting the ease with which ZF2 tests could be run by passing the necessary directory to phpunit from the main /tests directory, and noted configuration may be pushed into phpunit.xml instead of the current PHP file. While expressing an interest in a conf.d setup, Matthew noted this would depend on support in PHPUnit.
Summary: Ignore runtests.sh and just use the stock phpunit commands for ZF2.
Serious Question about Mcrypt
Artur Ejsmont observed that the mcrypt filter calls srand() with a limited range of potential seeds thus suggesting it would impact on the security of the filter. Enrico Zimuel replied that the srand() is only used in limited circumstances (where a better source of randomness is not available) and that it’s not a serious problem since the encryption security is not wholly based on the initialisation vector (IV) that uses srand() on some platforms. Nevertheless, he did note that some improvements could be made.
Artur responded with a general query on the efficacy of using srand() and rand() to avoid collisions. Pádraic Brady responded that rand() was particularly bad noting you could create collision in a matter of minutes. Pádraic also noted that mt_rand() was far more effective but also not entirely random (as a graph of its output would prove) suggesting that it was advisable to use better random sources such as /dev/random and /dev/urandom where feasible. Enrico also noted the availability of openssl_random_pseudo_bytes().
Summary: Getting random bytes is a tricky business.
ZF2 Zend\Code Bugfix
Nick Belhomme mentioned he had been looking at Zend\Code which is used heavily by Zend\Di. He noted his first impressions that it should work well by being token based but also referred to his opinion that it was quite error prone and the unit tests were not satisfactory.
To explain his case, he used an example of a method signature accepting four type hinted object parameters noting this could fail to be analysed correctly due to the whitespace in the parameter list (after each comma) not being handled correct by the ParameterScanner. Nick noted he’d committed a fix using a short trim function to his own git fork.
Regarding the unit tests, Nick explained why the current unit tests were insufficient in testing parameters and suggested rectifying the test doubles to account for whitespace.
Summary: Zend\Code needs to build up a fuller test suite accounting for different coding styles.
What is Mutation Testing?
Aug 2nd
Some time ago, in between working on Zend Framework, I booted up a couple of libraries that I really wanted to integrate into my workflow. Recently, I’ve been being putting these through the grindmill so they can be properly released and supported for public consumption across PEAR. Just as Mockery fell out of older work on PHPMock, Mutagenesis will fall out of another project called MutateMe. This is a short introductory article as to what Mutagenesis will do and why. In other words, what the heck is Mutation Testing?
First, some background.
The most common means of measuring confidence in a test suite is the Code Coverage metric. Code Coverage essentially checks, on a per class basis, how many of the lines of code in the class are executed by a test suite and expresses this as a percentage. For example, a Code Coverage of 85% means 85% of the lines of code in a class was executed and 15% were not. The greater the number of lines of code executed, the more confidence one can presumably have that a test suite is doing its job, i.e. verifying class behaviour, preventing the introduction of bugs, supporting refactoring, and so on.
I have a huge and insurmountable problem with Code Coverage. For starters, my average Code Coverage is closer to 80% than the 90% expected of projects such as Zend Framework. The gap is explained by me not testing what I call “braindead” functions, i.e. methods which are either ridiculously simple, where a malfunction would quickly become self-evident, or which are marginalised (on the borders of deprecation). So Code Coverage actually increases the amount of work I need to do for very little gain and a lot of frustration.
Secondly, Code Coverage is easy to spoof or misinterpret. Since it’s a metric measuring the execution of source code, you need only…well…execute the source code. It’s a simple matter to construct a series of wonderfully useless tests to do just that and obtain a high Code Coverage result – it’s done all the time in my experience once someone’s patience in writing quality unit test runs out. It is particularly evident in cases where unit tests are written after the source code is completed – a still too common practice in PHP. The less villainous flipside is that certain nuggets of source code are fundamentally difficult to test. For example, a complex algorithm suffering from poor documentation may make composing a suitable unit test near impossible. The rollout of OAuth was filled with such examples.
This leads into my opinion of Code Coverage. I view the venerable Code Coverage metric as a near pointless exercise. While it may tell how much source code a test suite exercises, it tells you nothing about the actual quality of those unit tests. They could be good tests, sort-of-good tests or absolutely horrendous tests – Code Coverage will never tell you either way. I say near pointless because there are precious few alternatives. We need something to give us a reason to trust and have confidence in test suites and Code Coverage is easy to implement and has been a part of PHPUnit since forever. So, by and large, we make do. We measure Code Coverage just to make certain some kind of unit testing was performed.
Is there nothing better?
A good unit test serves a simple purpose. It verifies a behaviour of an object. In PHP, we’re more likely to verify umpteen million behaviours in a single test (count your assertions!) but we’ll let that slide. Since a test verifies behaviour, it follows that a test should fail when that behaviour is changed. If a test does not fail when class behaviour is changed, it also follows that the original behaviour was not fully tested, i.e. there is a gaping hole in our test suite whether due to a flawed or missing test that could allow bugs entry into our application. So, to really stick unit tests under a microscope to assess their quality and our confidence in them, we need to introduce changes into the source code under test and see if the unit test suite can or cannot detect them.
This process is known as Mutation Testing. Mutagenesis is a Mutation Testing framework for PHP 5.3+.
Mutation Testing, as you have probably surmised, is not a super-complex activity. You take a set of source code and compile a list of possible “mutations” that are likely to break the behaviour of the source code. Then, you apply one mutation to that source to create a “mutant”, i.e. a copy of the source code with the mutation change applied. Next, you run the source code’s test suite against the mutant and see if any tests fail. If a test fails, celebrate – the mutation was detected so your tests were, in this instance, adequate. If no test fails, curse the Gods – the mutation was not detected and you’ll need to figure out whether a new test is needed or an old one modified/corrected. Rinse and repeat the above for each mutation you’ve compiled.
Mutations are typically quite simple such as replacing operators, booleans, strings and other scalar values with either an opposing form or a random value. Expressions might also be reversed or driven to zero to give an opposing boolean or zero value. Making such minor changes seems like a minor irritation but behind every serious flaw in an application is one or more smaller contributing errors. If your test cases can detect the potentially contributing errors, then there’s an excellent chance it would detect the bigger ones anyway. This is known as the Coupling Effect in Mutation Testing.
Some of you will be vaguely aware of Mutation Testing. In terms of implementations, Ruby has heckler, Python has Pester, and Java has Jumbler, Jester and a couple of others. Those who prefer Microsoft’s technologies can use Nester. There’s a running ryhme apparent since so much is inspired by the original Jester framework for Java. To my knowledge, Mutagenesis will be the only Mutation Testing framework for PHP (though I sincerely wish I was wrong).
Examining those libraries, you eventually realize a few problems with Mutation Testing which explain its lack of popularity until relatively recently: performance is a concern and Mutation Testing requires a Human Brain to complete the process.
Performance is a concern because each mutation requires a test suite to be executed. Imagine a set of classes from which you extract 100 possible mutations, coupled with a test suite that takes 5 minutes to run. A basic Mutation Testing framework (e.g. Ruby’s heckler) would therefore take 500 minutes to complete a Mutation Testing session. That’s 8.3 hours of continuous Mutation Testing. Mutation Testing for Zend Framework would be very interesting
.
Similar to Jumbler for Java, Mutagenesis will utilise a few heuristics (shortcuts) to significantly improve performance without compromising results. We only need one single test to fail in order to rule that a mutation was detected and killed, so we can do a few things to boost performance:
1. Terminate the test suite on first failure/error or exception.
2. Execute test cases in order of execution time ascending (fastest first; slowest last).
3. Prioritise execution of last test case to detect a mutant to take advantage of same-class detection.
4. Log which tests detect which mutations, and prioritise those associations in subsequent runs.
The effect of the above is to speed up Mutation Testing by a significant degree. The final heuristic ensures that for gradually changing source code and tests, the first Mutation Testing process might take a while but subsequent runs will be significantly faster making them far more usable in a Test-Driven Development setting. Mutation Testing is best served with a healthy dose of efficiency.
The second reason for its lack of popularity is that Mutation Testing can’t analyse the logic of the source code under test. For example, an expression might accept any integer less than 10 to evaluate to TRUE. If the input from another class were 7, and a mutation were generated to swap this for a 9, then the associated unit test would still pass (the mutation of switching 7 for 9 still allows the <10 expression evaluate to TRUE). If you recall, if a mutant passes a test suite than we assume either the presence of a flawed test or the lack of a suitable test. Obviously, as the above suggests, this isn’t always the case. Mutation Testing can and often will report false positives.
Ruling out false positives, coupled with the need to improve test suites to detect more mutations, makes Mutation Testing a source of extra work. Who likes extra work least? Programmers, especially the lazy kind
.
Mutation Testing is not a far fetched idea. The principles are sound and it beats the pants off Code Coverage when it comes to measuring what confidence we can have in our testing suites. It is still hampered, as a methodology, by the lack of good implementations in other programming languages. Mutagenesis, by adopting implementation heuristics from Java’s Jumbler, should avoid that fate and offer a decent framework in PHP that performs as well as can be expected.
Once it’s released…of course
. Mutagenesis is in development but should see a fresh release in a couple of weeks alongside Mockery. I’ll be looking forward to seeing how people perceive it. Mutation Testing has zero presence in PHP to date but having something to complement Code Coverage can’t do any harm!







