PHP, Zend Framework and Other Crazy Stuff
Archive for August, 2011
Zend Framework Contributors Mailing-List Summary; Edition #2 (July 2011)
Aug 24th
It’s been a busy month in Zend Framework land which I’ll blog about shortly so, after a few weeks of delay, here’s the July 2011 Summary of the zf-contributor’s mailing list.
ZF2 Feedback
Late June kicked off with this topic from Robert Basic with a set of notes on his experiences in getting started with ZF2 by migrating a ZF1 application. Adam Lundrigan noted, correctly, that a lot of “bleeding edge” code is not included in the main repository at this time and is distributed across contributor Github forks. He also raised the suggestion for a ZF2 Status Page. Derek Miranda voiced his agreement with Adam. Robert also agreed noting the difficulty in assessing the state of components.
Summary: ZF2 is scattered across multiple forks - be prepared to rely on notes such as Robert’s if jumping in at the deep end.
Creating a 1.11.9 Hotfix Release
A short note from Matthew Weier O’Phinney announced that a 1.11.9 hotfix release would be made to fix a number of backwards compatibility breaks introduced in 1.11.8. Issue tickets involved were ZF-11548, ZF-11550, ZF-10991 and ZF-10725.
Summary: It’s a maintenance release. It fixes stuff.
Zend\Http and MVC Developments
Ralph Schindler presented a document outlining a requirement list and the overall architecture of classes and interfaces for Zend\Http, noting work would commence on a prototype once any outstanding items suggested were cleared. Rob Zienart commented that the document indicated interfaces for Zend\Http Client and Server components and mentioned they needed proposals. Matthew responded that Zend\Http’s Server would deal with classes extending Zend\Service\Abstract such as SOAP and AMF but would not comprise a HTTP Server given it was covered by PHP 5.4. Anthony Shireman asked whether there were any Zend\Http Server plans or whether it was a “time will tell” situation. Matthew confirmed that that was the case given PHP 5.4 would include a HTTP Server and ZF2 could piggy back that implementation in offering a development server environment.
Summary: HTTP work continues. We’ll need it to communicate with all those big tubes connecting PCs.
[Proposal] ActiveRecord Proposal
Artur Bodera raised the proposal and offered to implement an ActiveRecord solution noting its benefits compared to Zend\Db. The proposal was published at http://framework.zend.com/wiki/display/ZFDEV2/ActiveRecord+-+Arthur+Bodera with a working branch at https://github.com/Thinkscape/zf2/branches/ActiveRecord.
Nicolas Bérard-Nault asked why it was necessary to reinvent the wheel instead of integrating with other existing and mature implementations. Artur responded that other solutions did not integrate with Zend Framework noting his proposal is built on Zend\Db from ZF2 and he wondered what was the point of Zend\Db\Table otherwise in the face of Doctrine or Propel. Peter Kokx responded to note that Zend\Db\Table implements the Table Data and Row Data Gateway patterns as distinct from ActiveRecord and that users shouldn’t interpret MVC as referring solely to ActiveRecord. Artur conceded that this was a good point but pressed his point that ActiveRecord was one tool which did on impose on any others available to Zend Framework using Zend\Db. Tomáš Fejfar voiced his support for adding ActiveRecord noting its value in simple use cases to get things done fast.
Ralph Schindler leaped to the rescue by noting that ActiveRecord is indeed planned for ZF2 and noting the significant work done to date on Zend\Db in his own feature branch. Artur Bodera welcomed the progress stating he would migrate his ActiveRecord solution over to the improved Zend\Db once complete.
Summary: We’re getting an ActiveRecord implementation for ZF2.
ZF2 Docbook Sources Converted to DocBook 5
Another short note from Matthew Weier O’Phinney informed the community that ZF2′s docbook formatted manual files had been migrated to Docbook 5. The community silently admired the completion of this task (nobody responded but I assume they silently admired all the same!). Matthew noted the README for manual generation would be updated and that Docbook 5 made certain tasks a lot easier.
Summary: ZF2 Manual will be written in Docbook 5, those using a visual XML editor may celebrate.
ZF2 Zend\Mail: To strip/validate or not to strip/validate (email adresses)
Status of the Test Suite (ZF2)
Keith Pope asked after the status of the Test Suite mentioning that phpunit.xml was mostly commented out, Zend\Di was not using the @group annotation for the test runner, and TestConfiguration.php was nearing 800 lines. He suggested that the configuration be spread into a conf.d setup (i.e. each configuration segment split into a separate file and all combined at runtime). Matthew responded noting the ease with which ZF2 tests could be run by passing the necessary directory to phpunit from the main /tests directory, and noted configuration may be pushed into phpunit.xml instead of the current PHP file. While expressing an interest in a conf.d setup, Matthew noted this would depend on support in PHPUnit.
Summary: Ignore runtests.sh and just use the stock phpunit commands for ZF2.
Serious Question about Mcrypt
Artur Ejsmont observed that the mcrypt filter calls srand() with a limited range of potential seeds thus suggesting it would impact on the security of the filter. Enrico Zimuel replied that the srand() is only used in limited circumstances (where a better source of randomness is not available) and that it’s not a serious problem since the encryption security is not wholly based on the initialisation vector (IV) that uses srand() on some platforms. Nevertheless, he did note that some improvements could be made.
Artur responded with a general query on the efficacy of using srand() and rand() to avoid collisions. Pádraic Brady responded that rand() was particularly bad noting you could create collision in a matter of minutes. Pádraic also noted that mt_rand() was far more effective but also not entirely random (as a graph of its output would prove) suggesting that it was advisable to use better random sources such as /dev/random and /dev/urandom where feasible. Enrico also noted the availability of openssl_random_pseudo_bytes().
Summary: Getting random bytes is a tricky business.
ZF2 Zend\Code Bugfix
Nick Belhomme mentioned he had been looking at Zend\Code which is used heavily by Zend\Di. He noted his first impressions that it should work well by being token based but also referred to his opinion that it was quite error prone and the unit tests were not satisfactory.
To explain his case, he used an example of a method signature accepting four type hinted object parameters noting this could fail to be analysed correctly due to the whitespace in the parameter list (after each comma) not being handled correct by the ParameterScanner. Nick noted he’d committed a fix using a short trim function to his own git fork.
Regarding the unit tests, Nick explained why the current unit tests were insufficient in testing parameters and suggested rectifying the test doubles to account for whitespace.
Summary: Zend\Code needs to build up a fuller test suite accounting for different coding styles.
What is Mutation Testing?
Aug 2nd
Some time ago, in between working on Zend Framework, I booted up a couple of libraries that I really wanted to integrate into my workflow. Recently, I’ve been being putting these through the grindmill so they can be properly released and supported for public consumption across PEAR. Just as Mockery fell out of older work on PHPMock, Mutagenesis will fall out of another project called MutateMe. This is a short introductory article as to what Mutagenesis will do and why. In other words, what the heck is Mutation Testing?
First, some background.
The most common means of measuring confidence in a test suite is the Code Coverage metric. Code Coverage essentially checks, on a per class basis, how many of the lines of code in the class are executed by a test suite and expresses this as a percentage. For example, a Code Coverage of 85% means 85% of the lines of code in a class was executed and 15% were not. The greater the number of lines of code executed, the more confidence one can presumably have that a test suite is doing its job, i.e. verifying class behaviour, preventing the introduction of bugs, supporting refactoring, and so on.
I have a huge and insurmountable problem with Code Coverage. For starters, my average Code Coverage is closer to 80% than the 90% expected of projects such as Zend Framework. The gap is explained by me not testing what I call “braindead” functions, i.e. methods which are either ridiculously simple, where a malfunction would quickly become self-evident, or which are marginalised (on the borders of deprecation). So Code Coverage actually increases the amount of work I need to do for very little gain and a lot of frustration.
Secondly, Code Coverage is easy to spoof or misinterpret. Since it’s a metric measuring the execution of source code, you need only…well…execute the source code. It’s a simple matter to construct a series of wonderfully useless tests to do just that and obtain a high Code Coverage result - it’s done all the time in my experience once someone’s patience in writing quality unit test runs out. It is particularly evident in cases where unit tests are written after the source code is completed - a still too common practice in PHP. The less villainous flipside is that certain nuggets of source code are fundamentally difficult to test. For example, a complex algorithm suffering from poor documentation may make composing a suitable unit test near impossible. The rollout of OAuth was filled with such examples.
This leads into my opinion of Code Coverage. I view the venerable Code Coverage metric as a near pointless exercise. While it may tell how much source code a test suite exercises, it tells you nothing about the actual quality of those unit tests. They could be good tests, sort-of-good tests or absolutely horrendous tests - Code Coverage will never tell you either way. I say near pointless because there are precious few alternatives. We need something to give us a reason to trust and have confidence in test suites and Code Coverage is easy to implement and has been a part of PHPUnit since forever. So, by and large, we make do. We measure Code Coverage just to make certain some kind of unit testing was performed.
Is there nothing better?
A good unit test serves a simple purpose. It verifies a behaviour of an object. In PHP, we’re more likely to verify umpteen million behaviours in a single test (count your assertions!) but we’ll let that slide. Since a test verifies behaviour, it follows that a test should fail when that behaviour is changed. If a test does not fail when class behaviour is changed, it also follows that the original behaviour was not fully tested, i.e. there is a gaping hole in our test suite whether due to a flawed or missing test that could allow bugs entry into our application. So, to really stick unit tests under a microscope to assess their quality and our confidence in them, we need to introduce changes into the source code under test and see if the unit test suite can or cannot detect them.
This process is known as Mutation Testing. Mutagenesis is a Mutation Testing framework for PHP 5.3+.
Mutation Testing, as you have probably surmised, is not a super-complex activity. You take a set of source code and compile a list of possible “mutations” that are likely to break the behaviour of the source code. Then, you apply one mutation to that source to create a “mutant”, i.e. a copy of the source code with the mutation change applied. Next, you run the source code’s test suite against the mutant and see if any tests fail. If a test fails, celebrate - the mutation was detected so your tests were, in this instance, adequate. If no test fails, curse the Gods - the mutation was not detected and you’ll need to figure out whether a new test is needed or an old one modified/corrected. Rinse and repeat the above for each mutation you’ve compiled.
Mutations are typically quite simple such as replacing operators, booleans, strings and other scalar values with either an opposing form or a random value. Expressions might also be reversed or driven to zero to give an opposing boolean or zero value. Making such minor changes seems like a minor irritation but behind every serious flaw in an application is one or more smaller contributing errors. If your test cases can detect the potentially contributing errors, then there’s an excellent chance it would detect the bigger ones anyway. This is known as the Coupling Effect in Mutation Testing.
Some of you will be vaguely aware of Mutation Testing. In terms of implementations, Ruby has heckler, Python has Pester, and Java has Jumbler, Jester and a couple of others. Those who prefer Microsoft’s technologies can use Nester. There’s a running ryhme apparent since so much is inspired by the original Jester framework for Java. To my knowledge, Mutagenesis will be the only Mutation Testing framework for PHP (though I sincerely wish I was wrong).
Examining those libraries, you eventually realize a few problems with Mutation Testing which explain its lack of popularity until relatively recently: performance is a concern and Mutation Testing requires a Human Brain to complete the process.
Performance is a concern because each mutation requires a test suite to be executed. Imagine a set of classes from which you extract 100 possible mutations, coupled with a test suite that takes 5 minutes to run. A basic Mutation Testing framework (e.g. Ruby’s heckler) would therefore take 500 minutes to complete a Mutation Testing session. That’s 8.3 hours of continuous Mutation Testing. Mutation Testing for Zend Framework would be very interesting .
Similar to Jumbler for Java, Mutagenesis will utilise a few heuristics (shortcuts) to significantly improve performance without compromising results. We only need one single test to fail in order to rule that a mutation was detected and killed, so we can do a few things to boost performance:
1. Terminate the test suite on first failure/error or exception.
2. Execute test cases in order of execution time ascending (fastest first; slowest last).
3. Prioritise execution of last test case to detect a mutant to take advantage of same-class detection.
4. Log which tests detect which mutations, and prioritise those associations in subsequent runs.
The effect of the above is to speed up Mutation Testing by a significant degree. The final heuristic ensures that for gradually changing source code and tests, the first Mutation Testing process might take a while but subsequent runs will be significantly faster making them far more usable in a Test-Driven Development setting. Mutation Testing is best served with a healthy dose of efficiency.
The second reason for its lack of popularity is that Mutation Testing can’t analyse the logic of the source code under test. For example, an expression might accept any integer less than 10 to evaluate to TRUE. If the input from another class were 7, and a mutation were generated to swap this for a 9, then the associated unit test would still pass (the mutation of switching 7 for 9 still allows the <10 expression evaluate to TRUE). If you recall, if a mutant passes a test suite than we assume either the presence of a flawed test or the lack of a suitable test. Obviously, as the above suggests, this isn’t always the case. Mutation Testing can and often will report false positives.
Ruling out false positives, coupled with the need to improve test suites to detect more mutations, makes Mutation Testing a source of extra work. Who likes extra work least? Programmers, especially the lazy kind .
Mutation Testing is not a far fetched idea. The principles are sound and it beats the pants off Code Coverage when it comes to measuring what confidence we can have in our testing suites. It is still hampered, as a methodology, by the lack of good implementations in other programming languages. Mutagenesis, by adopting implementation heuristics from Java’s Jumbler, should avoid that fate and offer a decent framework in PHP that performs as well as can be expected.
Once it’s released…of course . Mutagenesis is in development but should see a fresh release in a couple of weeks alongside Mockery. I’ll be looking forward to seeing how people perceive it. Mutation Testing has zero presence in PHP to date but having something to complement Code Coverage can’t do any harm!