PHP, Zend Framework and Other Crazy Stuff
Archive for February, 2009
Unit Testing: One Test, One Assertion – Why It Works
Feb 16th
In my last post I touched on the topic of multiple assertions in a unit test and linked it somewhat harshly to being a factor in “lazy” or shallow testing. I actually didn’t intend on linking them at all, but the truth is that unit testing isn’t a series of mutually exclusive practices – they are all linked to varying degrees. I’m sure I have at least a degree of laziness myself in testing
.
The Basic Idea
As a recap, the idea is that every unit test should only have one single assertion. It’s a fairly well known method to combat a range of problems evident in assertion loaded tests where the numerous assertions obscure the meaning of the test, and where a failed test does not tell you the specific assertions which failed (since tests traditionally fail after any assertion failure, the remaining assertions are never executed thus leaving you blind to whether they would have passed or failed). There are other problems with the approach as well – which is why I do not like seeing multiple assertions in a single test. I’m not the only one – I think. And just as there are problems, there are also exceptions when multiple assertions might prove necessary – personally I think this scenario is pretty rare however. After my years of unit testing, I certainly have not encountered all that many.
These effects by themselves make me fairly comfortable in enforcing Single Assertions in tests as a rule, not a guideline. Guidelines are broken easily, but with rules you can at least strike back and make certain you don’t run aground on someone else’s inexperience or ill fated experiments in making testing “easier”. It’s also extremely easy to detect
. PHPUnit, for example, reports the number of both tests and assertions executed – so you can calculate a quick ratio of assertions to tests in mere seconds. I’m going to start calling it the Obscurity Ratio! Obviously, a ratio of 1:1 would be perfect – but I’m sure we can accept some multiple assertions within a tolerance level. A ratio of more than 1.1:1 would indicate creeping obscurity (a signal it’s time for me to intervene, refactor, and do some mutation testing for extra assurance). A ratio of 2:1 indicates a lost cause. Anything higher and God help you all.
This Sounds Like A Bad Idea!
In watching the reactions to my original post (and not just the comments here), I’ve noticed that the excuses are generally similar. People have leaned towards justifying using multiple assertions for complex return values, or pointed out that my example was too simple (since that really really shows its a Bad Idea™
), or that all the mini-tests would take too long to write, and even longer to execute, or that the world will end since Peter Petrelli thinks this is all a Bad Idea™ and everyone knows Sylar invented it anyway.
Not to completely oppose all the opinions (it really was a simple example), but people are spending too much time looking at the end result, and not enough at the process behind it all. And it’s by missing the process that you are missing the point.
Now let me disclaim a little – I’m really really sure there are times multiple assertions are needed. So my main point is not that this is an absolute rule with no possible exceptions so help me God, rather that it should be a rule unless you are literally forced to tackle an exception to it.
Why It’s A Good Idea…
Unit Testing has various interpretations. It’s original genesis promoted the verification of source code, a means of making sure code worked. It was not long before the obvious flaws in that thinking emerged – how do you verify the verified? You already know the code works before you write the test, so what value does it really bring to the table? Does it really promote code quality and rapid development – or just create a horrible task some low ranking developer will be landed with?
This sparked a revolution – Test Driven Design (TDD). In TDD, tests stepped back from verifying, and entered a role as specifications. Before you wrote the source code, you first wrote a test which specified the behaviour you intended on adding. Once you had the behaviour specified, you then wrote just enough source code to make that test pass. The result was a process of simple steps – and here’s where I diverge from others. A simple example works – because all half decent source code is nothing more than an amalgamation of simple examples – in fact, I dare anyone to prove otherwise. Then I can show you the PHP manual and you can explain how the hell you made PHP that complex
.
In TDD we back away from testing complex highly involved code after the fact (too late then!), to preemptively writing tests to produce simpler straightforward code produced in steps. Simpler is better – always. The differences can be startling. Try writing two identical libraries both ways and you’ll see how.
It’s this notion of complexity that is often used to justify multiple assertions – if you break it down into the simple components that that complexity was built out of, multiple assertions don’t have a leg to stand on (just the rare instances out of your control because they absolutely must be done). The most common (just to preempt you all) scenarios for complexity in testing assertions boil down to BIG stuff – deeply nested stuff. Arrays and XML are the most common ones. Mathematical equations for relativity are simpler than those!
But let’s pass on through to the other benefits instead of obsessing over just one.
If your tests double as your specification to describe how a class should behave (and yes, I’m robbing the definition list in Behaviour-Driven Development, or BDD, blind) or documentation – you’re doing nothing wrong. Tests should document behaviour. This is why test method names are important – they summarise the behaviour the test is verifying.
Now add another little rule: every test only verifies one single behaviour. When we make that connection we realise something even more startling…
1 Behaviour == 1 Assertion
Now we’re into the meat of my madness, the gritty reality of the one assertion per test rule. Any behaviour is, by its very nature, both specific and unique. It does not require multiple assertions. Either it does one single thing, or it doesn’t do it at all. It’s black and white. The only danger is that you get so smart, you think in terms of big behaviours which are really just lots of little behaviours bundled together. Enjoy applying TDD in that case…it will sink you faster than you can say “Peter Petrelli Is Depressing” and you’ll end up back in the land of writing tests AFTER the code. Anyone going to admit to doing that?
We all know who we are (yes, I do it too now and again).
That’s why Test-Driven Design is so effective. It enforces a habit (once you actually understand the damn practice which is almost designed to hide the behaviour facet!) of specifying class behaviour by behaviour, assertion by assertion. It breaks down the complex overall purpose of any class into discrete simple steps, a series of tiny goals you can easily achieve.
Simple, tiny, discrete – like individual assertions. It’s THAT simple.
That’s why the one assertion per test is a good idea – because it’s obvious in TDD. It’s what works, and it ensures your tests do exactly what they’re suppossed to be doing – verifying discrete behaviour, and documenting that behaviour to make other developer’s lives a bit easier (and saner).
The Spontaneous Test Population Explosion Myth
This is the one myth everyone knows targets the one assertion per test rule. The question is whether it’s a myth, or whether something else is. The idea is that by requiring one assertion per test, you end up with more tests, more code, higher execution times, etc. Which is to say…more behaviours. Which is to say – where the hell did the new behaviours come from?
Unwittingly, the excuse reveals itself to be the myth. If you have tests, and they can survive as multiple tests – you pretty much admitted your original tests were collections of behaviours. Welcome to the land of Obscure Tests – they test many things, and nobody can figure out just what.
Tests are documentation. Like any form of writing, you can block together ideas into a monologue or use properly formatted paragraphs and bullet points to break things down into a more digestable form for reading. It comes with a cost, but discrete tests are more helpful as documentation.
As for execution times, I really don’t understand why people find this a problem. PHPUnit let’s you run any test class in isolation, and you can group test suites in any shape or form. If your tests are running slow, treat them like any other code – use XDebug to locate the bottlenecks and slow tests and do something about them. Use more efficient code, refactor them to hell and back, isolate the slow tests – we can all do performance optimisation.
The other alternative is to let tests run in the background and use a notification system to report failures while you go away and write another test or two. I do this myself – see what notification apps you can hook into from PHP or some other test enviroment tool. I had working Snarl code for a PHP extension working under Windows somewhere if you can use a compiler and really need it.
Attitude Counts
Another compelling reason to adopt one assertion per test is your ego. Look at me – I have an ego, I write stuff because I enjoy doing it and am motivated by other people reading it and thinking it’s the best thing ever written
. We all have egos. My ego has me writing free books (though the prospect of a new Macbook Pro is another motive…hehehe).
Unfortunately our ego is also a major enemy in unit testing. If you find yourself saying you prefer multiple assertions because it works for you, and you can understand them, and they make perfect sense to you, and hey, your code works with 200% code coverage – then count how many “you” words you just used (or “I” if writing first person!).
It’s not about you!
Other developers have to read your tests eventually, unless it’s a top secret project only you will ever work on. Don’t saddle the rest of the world with the product of your ego. Everyone finds simpler tests easier to work with. The general rule of thumb is pretty simple – if you give someone a copy of your tests, and nothing else, could they ever write the code to fit those tests in a reasonable time from scratch without begging you for assistance?
Conclusion
More food for thought
. No pretty code to look at for this one so if you haven’t read the last post yet, go read it now.
Until next time…when I find something else to moan about.
Unit Testing: Multiple Assertions And Lazy/Shallow Testing Are Evil
Feb 12th
Unit Testing as a practice is like any other – there are good practices, and bad practices. Two of the worst practices are overloading tests with assertions, and writing lazy or shallow tests.
Before we recount the dire consequences of these practices, it’s worth knowing why they are so attractive and not immediately perceived as being bad. In short, every test you write requires that you setup the test environment, create a scenario for possible failure, add an assertion, and then ensure the source code makes that assertion pass. This requires code – sometimes a lot of code. So adding multiple assertions to each test minimises the work needed to write tests, since using multiple assertions takes advantage of existing code to avoid writing new stuff to clutter your test classes. It can also help to tackle multiple but related results in the same test.
So long as you know the assertions will pass – this makes writing unit tests quite a bit faster at times. Unfortunately, a preoccupation with minimising test code also encourages developers to keep tests overly simple to the point that they do not dig deep enough into whether a test actually accomplishes its objective – often because that objective has never previously been documented.
These considerations lead to tests which may be similar to (using PHPUnit):
[geshi lang=php]class GameTest extends PHPUnit_Framework_TestCase
{
public function testScoreIsZeroWithNoScoring()
{
$game = new Game;
$this->assertEquals(0, $game->score);
$this->assertEquals(0, $game->scoreTotal);
$game->score(1);
$this->assertEquals(1, $game->score);
$this->assertEquals(1, $game->scoreTotal);
}
}[/geshi]
Here we have a simple test with four assertions. All four test $score and $scoreTotal to make certain they remain at zero after the object is initialised and no score (or zero score) is assigned. They then revisit the situation after a score has occured. To the naked eye, the test is easy to understand. Here’s a class which will pass the above test (if you spot the problem after seeing the class, you should give yourself a treat
).
[geshi lang=php]class Game
{
public $score = 0;
public $scoreTotal = 0;
public function score($score)
{
$this->score = $score;
$this->scoreTotal += $score;
}
}[/geshi]
Back to those dire consequences. Consider the output of test results showing a failure I will now introduce by initially setting $score to 1 in the class.
PHPUnit 3.3.14 by Sebastian Bergmann. F Time: 0 seconds There was 1 failure: 1) testScoreIsZeroWithNoScoring(GameTest) Failed asserting that <integer:1> matches expected value <integer:0>. D:\projects\tinker\GameTest.php:11 FAILURES! Tests: 1, Assertions: 1, Failures: 1.
Multiple assertions do unfortunately have side effects. The most obvious one is that it only takes one assertion to fail, and the entire test will fail with it. It ignores whether other assertions in the same test would have actually passed (hence we see that the last line of the results show PHPUnit only executed the first assertion, and ignored all others), which leaves you blind as to whether the error is impacting other assertions. This creates a maintainance nightmare – for every test that fails, you’re never certain what should have failed! You only get one part of the puzzle to work with.
Unit Tests should be specific. In fact, as a general rule, there should only be one assertion per test method. If a failed test doesn’t immediately tell you where the problem is, and what assertions will fail, and offer at least some minimal description of the failed behaviour (typically the test title should be sufficiently descriptive) then its utility is severely reduced. You end up doing the same detective work needed in the absence of unit tests – which makes those unit tests less beneficial since you rob the maintainer of instantansous specific feedback and force them to edit tests, often rewriting them to be more specific, and/or employ typical debugging approaches to locate the problem in the source code.
Another impact, is that multiple assertions are often a sign that the tests were written post development, or without attention to the behaviour of the class. This increases the risk that the tests are not only confusing when they fail, but that the tests are not even complete. Truly paying attention to the role of behaviour discourages multiple assertions and promotes specificity.
There’s a side story here about the foolishness of believing that code coverage is an absolute measure of the effectiveness of a unit testing suite. It’s not – it’s only one metric to assist in that measurement, and not a very reliable one at that. It’s entirely possible to gain 90% or even 100% code coverage without writing tests that cover even a quarter of the expected behaviour of the class. Code coverage measures how much of the source code lines are actually executed – it doesn’t tell you if they were executed enough times, in the right order, or if the tests were even appropriate to start with.
This is the problem some people will have noted from before (give yourself a treat!). The class obviously has more behaviour than the original passing test seemed aware of. Despite this, guess what the original test had as a code coverage metric? 100%
Here’s how the test should have been written, code coverage and multiple assertions be damned. Your tests are only complete when you are absolutely certain they cover off on all class behaviour – and not a second sooner.
[geshi lang=php]class GameTest extends PHPUnit_Framework_TestCase
{
public function testStartingLastScoreIsZero()
{
$game = new Game;
$this->assertEquals(0, $game->score);
}
public function testStartingTotalScoreIsZero()
{
$game = new Game;
$this->assertEquals(0, $game->scoreTotal);
}
public function testLastScoreOnlyStoresLastScoreRewarded()
{
$game = new Game;
$game->score(2);
$game->score(5);
$this->assertEquals(5, $game->score);
}
public function testTotalScoreAccumulatesRewardedScores()
{
$game = new Game;
$game->score(1);
$game->score(2);
$this->assertEquals(3, $game->scoreTotal);
}
}[/geshi]
The first test class was very obviously smaller and simpler. Not only is it hampered by multiple confusing assertions, but its simplicity also indicates a lack of good test design – by reusing and ignoring specific test scenario setups (i.e. seeking a fail, before editing code to pass) it’s a test suite that passes, but doesn’t quite verify everything. Unfortunately, the two go hand in hand in my experience. To make the first test pass, and the second more specific tests fail, use the following version of the Game class.
[geshi lang=php]class Game
{
public $score = 0;
public $scoreTotal = 0;
public function score($score)
{
$this->score += $score;
$this->scoreTotal = $score;
}
}[/geshi]
Here we’ve simply assumed someone got confused, and mixed up the purpose of the two properties in the score() method, so the += sign has moved in error. Now how many times has that happened to you?
If you run the original simpler and multiple assertion stuffed test – it will show everything is working as intended (as an aside, this is one example of where Mutation Testing could have picked up a problem which the original test couldn’t detect).
PHPUnit 3.3.14 by Sebastian Bergmann. . Time: 0 seconds OK (1 test, 4 assertions)
The later more detailed, specific tests show a different story:
PHPUnit 3.3.14 by Sebastian Bergmann. ..FF Time: 0 seconds There were 2 failures: 1) testScoreOnlyStoresLastScoreRewarded(GameTest) Failed asserting that <integer:7> matches expected value <integer:5>. D:\projects\tinker\GameTest.php:38 2) testTotalScoreAccumulatesRewardedScores(GameTest) Failed asserting that <integer:2> matches expected value <integer:3>. D:\projects\tinker\GameTest.php:46 FAILURES! Tests: 4, Assertions: 4, Failures: 2.
Now imagine someone has written hundreds of tests in the manner of my first example. Can you imagine the world of hurt anyone attempting to refactor and maintain the underlying source code is facing? The countless tests they’ll need to debug, rewrite, and expand? The tears of frustration? The hair pulling? The talking to your reflection because of a psychotic break?
Don’t do that the next time you write unit tests
. Remember – be as specific as possible about what the class should do, and you will quickly realise that you only need one assertion per test. Sure, it means writing additional test code – but at least you’re now writing tests that truly work!
