PHP, Zend Framework and Other Crazy Stuff
PHP General
Zend Framework Contributors Mailing-List Summary; Edition #2 (July 2011)
Aug 24th
It’s been a busy month in Zend Framework land which I’ll blog about shortly so, after a few weeks of delay, here’s the July 2011 Summary of the zf-contributor’s mailing list.
ZF2 Feedback
Late June kicked off with this topic from Robert Basic with a set of notes on his experiences in getting started with ZF2 by migrating a ZF1 application. Adam Lundrigan noted, correctly, that a lot of “bleeding edge” code is not included in the main repository at this time and is distributed across contributor Github forks. He also raised the suggestion for a ZF2 Status Page. Derek Miranda voiced his agreement with Adam. Robert also agreed noting the difficulty in assessing the state of components.
Summary: ZF2 is scattered across multiple forks – be prepared to rely on notes such as Robert’s if jumping in at the deep end.
Creating a 1.11.9 Hotfix Release
A short note from Matthew Weier O’Phinney announced that a 1.11.9 hotfix release would be made to fix a number of backwards compatibility breaks introduced in 1.11.8. Issue tickets involved were ZF-11548, ZF-11550, ZF-10991 and ZF-10725.
Summary: It’s a maintenance release. It fixes stuff.
Zend\Http and MVC Developments
Ralph Schindler presented a document outlining a requirement list and the overall architecture of classes and interfaces for Zend\Http, noting work would commence on a prototype once any outstanding items suggested were cleared. Rob Zienart commented that the document indicated interfaces for Zend\Http Client and Server components and mentioned they needed proposals. Matthew responded that Zend\Http’s Server would deal with classes extending Zend\Service\Abstract such as SOAP and AMF but would not comprise a HTTP Server given it was covered by PHP 5.4. Anthony Shireman asked whether there were any Zend\Http Server plans or whether it was a “time will tell” situation. Matthew confirmed that that was the case given PHP 5.4 would include a HTTP Server and ZF2 could piggy back that implementation in offering a development server environment.
Summary: HTTP work continues. We’ll need it to communicate with all those big tubes connecting PCs.
[Proposal] ActiveRecord Proposal
Artur Bodera raised the proposal and offered to implement an ActiveRecord solution noting its benefits compared to Zend\Db. The proposal was published at http://framework.zend.com/wiki/display/ZFDEV2/ActiveRecord+-+Arthur+Bodera with a working branch at https://github.com/Thinkscape/zf2/branches/ActiveRecord.
Nicolas Bérard-Nault asked why it was necessary to reinvent the wheel instead of integrating with other existing and mature implementations. Artur responded that other solutions did not integrate with Zend Framework noting his proposal is built on Zend\Db from ZF2 and he wondered what was the point of Zend\Db\Table otherwise in the face of Doctrine or Propel. Peter Kokx responded to note that Zend\Db\Table implements the Table Data and Row Data Gateway patterns as distinct from ActiveRecord and that users shouldn’t interpret MVC as referring solely to ActiveRecord. Artur conceded that this was a good point but pressed his point that ActiveRecord was one tool which did on impose on any others available to Zend Framework using Zend\Db. Tomáš Fejfar voiced his support for adding ActiveRecord noting its value in simple use cases to get things done fast.
Ralph Schindler leaped to the rescue by noting that ActiveRecord is indeed planned for ZF2 and noting the significant work done to date on Zend\Db in his own feature branch. Artur Bodera welcomed the progress stating he would migrate his ActiveRecord solution over to the improved Zend\Db once complete.
Summary: We’re getting an ActiveRecord implementation for ZF2.
ZF2 Docbook Sources Converted to DocBook 5
Another short note from Matthew Weier O’Phinney informed the community that ZF2′s docbook formatted manual files had been migrated to Docbook 5. The community silently admired the completion of this task (nobody responded but I assume they silently admired all the same!). Matthew noted the README for manual generation would be updated and that Docbook 5 made certain tasks a lot easier.
Summary: ZF2 Manual will be written in Docbook 5, those using a visual XML editor may celebrate.
ZF2 Zend\Mail: To strip/validate or not to strip/validate (email adresses)
Status of the Test Suite (ZF2)
Keith Pope asked after the status of the Test Suite mentioning that phpunit.xml was mostly commented out, Zend\Di was not using the @group annotation for the test runner, and TestConfiguration.php was nearing 800 lines. He suggested that the configuration be spread into a conf.d setup (i.e. each configuration segment split into a separate file and all combined at runtime). Matthew responded noting the ease with which ZF2 tests could be run by passing the necessary directory to phpunit from the main /tests directory, and noted configuration may be pushed into phpunit.xml instead of the current PHP file. While expressing an interest in a conf.d setup, Matthew noted this would depend on support in PHPUnit.
Summary: Ignore runtests.sh and just use the stock phpunit commands for ZF2.
Serious Question about Mcrypt
Artur Ejsmont observed that the mcrypt filter calls srand() with a limited range of potential seeds thus suggesting it would impact on the security of the filter. Enrico Zimuel replied that the srand() is only used in limited circumstances (where a better source of randomness is not available) and that it’s not a serious problem since the encryption security is not wholly based on the initialisation vector (IV) that uses srand() on some platforms. Nevertheless, he did note that some improvements could be made.
Artur responded with a general query on the efficacy of using srand() and rand() to avoid collisions. Pádraic Brady responded that rand() was particularly bad noting you could create collision in a matter of minutes. Pádraic also noted that mt_rand() was far more effective but also not entirely random (as a graph of its output would prove) suggesting that it was advisable to use better random sources such as /dev/random and /dev/urandom where feasible. Enrico also noted the availability of openssl_random_pseudo_bytes().
Summary: Getting random bytes is a tricky business.
ZF2 Zend\Code Bugfix
Nick Belhomme mentioned he had been looking at Zend\Code which is used heavily by Zend\Di. He noted his first impressions that it should work well by being token based but also referred to his opinion that it was quite error prone and the unit tests were not satisfactory.
To explain his case, he used an example of a method signature accepting four type hinted object parameters noting this could fail to be analysed correctly due to the whitespace in the parameter list (after each comma) not being handled correct by the ParameterScanner. Nick noted he’d committed a fix using a short trim function to his own git fork.
Regarding the unit tests, Nick explained why the current unit tests were insufficient in testing parameters and suggested rectifying the test doubles to account for whitespace.
Summary: Zend\Code needs to build up a fuller test suite accounting for different coding styles.
What is Mutation Testing?
Aug 2nd
Some time ago, in between working on Zend Framework, I booted up a couple of libraries that I really wanted to integrate into my workflow. Recently, I’ve been being putting these through the grindmill so they can be properly released and supported for public consumption across PEAR. Just as Mockery fell out of older work on PHPMock, Mutagenesis will fall out of another project called MutateMe. This is a short introductory article as to what Mutagenesis will do and why. In other words, what the heck is Mutation Testing?
First, some background.
The most common means of measuring confidence in a test suite is the Code Coverage metric. Code Coverage essentially checks, on a per class basis, how many of the lines of code in the class are executed by a test suite and expresses this as a percentage. For example, a Code Coverage of 85% means 85% of the lines of code in a class was executed and 15% were not. The greater the number of lines of code executed, the more confidence one can presumably have that a test suite is doing its job, i.e. verifying class behaviour, preventing the introduction of bugs, supporting refactoring, and so on.
I have a huge and insurmountable problem with Code Coverage. For starters, my average Code Coverage is closer to 80% than the 90% expected of projects such as Zend Framework. The gap is explained by me not testing what I call “braindead” functions, i.e. methods which are either ridiculously simple, where a malfunction would quickly become self-evident, or which are marginalised (on the borders of deprecation). So Code Coverage actually increases the amount of work I need to do for very little gain and a lot of frustration.
Secondly, Code Coverage is easy to spoof or misinterpret. Since it’s a metric measuring the execution of source code, you need only…well…execute the source code. It’s a simple matter to construct a series of wonderfully useless tests to do just that and obtain a high Code Coverage result – it’s done all the time in my experience once someone’s patience in writing quality unit test runs out. It is particularly evident in cases where unit tests are written after the source code is completed – a still too common practice in PHP. The less villainous flipside is that certain nuggets of source code are fundamentally difficult to test. For example, a complex algorithm suffering from poor documentation may make composing a suitable unit test near impossible. The rollout of OAuth was filled with such examples.
This leads into my opinion of Code Coverage. I view the venerable Code Coverage metric as a near pointless exercise. While it may tell how much source code a test suite exercises, it tells you nothing about the actual quality of those unit tests. They could be good tests, sort-of-good tests or absolutely horrendous tests – Code Coverage will never tell you either way. I say near pointless because there are precious few alternatives. We need something to give us a reason to trust and have confidence in test suites and Code Coverage is easy to implement and has been a part of PHPUnit since forever. So, by and large, we make do. We measure Code Coverage just to make certain some kind of unit testing was performed.
Is there nothing better?
A good unit test serves a simple purpose. It verifies a behaviour of an object. In PHP, we’re more likely to verify umpteen million behaviours in a single test (count your assertions!) but we’ll let that slide. Since a test verifies behaviour, it follows that a test should fail when that behaviour is changed. If a test does not fail when class behaviour is changed, it also follows that the original behaviour was not fully tested, i.e. there is a gaping hole in our test suite whether due to a flawed or missing test that could allow bugs entry into our application. So, to really stick unit tests under a microscope to assess their quality and our confidence in them, we need to introduce changes into the source code under test and see if the unit test suite can or cannot detect them.
This process is known as Mutation Testing. Mutagenesis is a Mutation Testing framework for PHP 5.3+.
Mutation Testing, as you have probably surmised, is not a super-complex activity. You take a set of source code and compile a list of possible “mutations” that are likely to break the behaviour of the source code. Then, you apply one mutation to that source to create a “mutant”, i.e. a copy of the source code with the mutation change applied. Next, you run the source code’s test suite against the mutant and see if any tests fail. If a test fails, celebrate – the mutation was detected so your tests were, in this instance, adequate. If no test fails, curse the Gods – the mutation was not detected and you’ll need to figure out whether a new test is needed or an old one modified/corrected. Rinse and repeat the above for each mutation you’ve compiled.
Mutations are typically quite simple such as replacing operators, booleans, strings and other scalar values with either an opposing form or a random value. Expressions might also be reversed or driven to zero to give an opposing boolean or zero value. Making such minor changes seems like a minor irritation but behind every serious flaw in an application is one or more smaller contributing errors. If your test cases can detect the potentially contributing errors, then there’s an excellent chance it would detect the bigger ones anyway. This is known as the Coupling Effect in Mutation Testing.
Some of you will be vaguely aware of Mutation Testing. In terms of implementations, Ruby has heckler, Python has Pester, and Java has Jumbler, Jester and a couple of others. Those who prefer Microsoft’s technologies can use Nester. There’s a running ryhme apparent since so much is inspired by the original Jester framework for Java. To my knowledge, Mutagenesis will be the only Mutation Testing framework for PHP (though I sincerely wish I was wrong).
Examining those libraries, you eventually realize a few problems with Mutation Testing which explain its lack of popularity until relatively recently: performance is a concern and Mutation Testing requires a Human Brain to complete the process.
Performance is a concern because each mutation requires a test suite to be executed. Imagine a set of classes from which you extract 100 possible mutations, coupled with a test suite that takes 5 minutes to run. A basic Mutation Testing framework (e.g. Ruby’s heckler) would therefore take 500 minutes to complete a Mutation Testing session. That’s 8.3 hours of continuous Mutation Testing. Mutation Testing for Zend Framework would be very interesting
.
Similar to Jumbler for Java, Mutagenesis will utilise a few heuristics (shortcuts) to significantly improve performance without compromising results. We only need one single test to fail in order to rule that a mutation was detected and killed, so we can do a few things to boost performance:
1. Terminate the test suite on first failure/error or exception.
2. Execute test cases in order of execution time ascending (fastest first; slowest last).
3. Prioritise execution of last test case to detect a mutant to take advantage of same-class detection.
4. Log which tests detect which mutations, and prioritise those associations in subsequent runs.
The effect of the above is to speed up Mutation Testing by a significant degree. The final heuristic ensures that for gradually changing source code and tests, the first Mutation Testing process might take a while but subsequent runs will be significantly faster making them far more usable in a Test-Driven Development setting. Mutation Testing is best served with a healthy dose of efficiency.
The second reason for its lack of popularity is that Mutation Testing can’t analyse the logic of the source code under test. For example, an expression might accept any integer less than 10 to evaluate to TRUE. If the input from another class were 7, and a mutation were generated to swap this for a 9, then the associated unit test would still pass (the mutation of switching 7 for 9 still allows the <10 expression evaluate to TRUE). If you recall, if a mutant passes a test suite than we assume either the presence of a flawed test or the lack of a suitable test. Obviously, as the above suggests, this isn’t always the case. Mutation Testing can and often will report false positives.
Ruling out false positives, coupled with the need to improve test suites to detect more mutations, makes Mutation Testing a source of extra work. Who likes extra work least? Programmers, especially the lazy kind
.
Mutation Testing is not a far fetched idea. The principles are sound and it beats the pants off Code Coverage when it comes to measuring what confidence we can have in our testing suites. It is still hampered, as a methodology, by the lack of good implementations in other programming languages. Mutagenesis, by adopting implementation heuristics from Java’s Jumbler, should avoid that fate and offer a decent framework in PHP that performs as well as can be expected.
Once it’s released…of course
. Mutagenesis is in development but should see a fresh release in a couple of weeks alongside Mockery. I’ll be looking forward to seeing how people perceive it. Mutation Testing has zero presence in PHP to date but having something to complement Code Coverage can’t do any harm!
Out With The Old, In With The New: Original MySQL Extension Heading For Retirement?
Jul 16th
When we use the term PHP, we are often silently associating it with the abbreviation LAMP (that’s Linux, Apache, MySQL and PHP just in case you don’t recall). MySQL has been our bread and butter in PHP for over a decade; an old friend, accomplice and partner in crime. This was made possible with the MySQL extension. Indeed, you can scarcely find a basic nuts and bolts PHP tutorial that doesn’t use MySQL. Which is probably why it’s a good idea to give it a huge going away bash (and make sure it finds the exit afterwards and catches a cab to oblivion!). We’ve since seen replacements like the MySQL Improved extension (mysqli) and PHP Data Objects (PDO). These are simply better from the additional features each adds to their integration in higher level libraries such as Doctrine.
But, as with any basic change to a successful formula, there was bound to be some controversy at the mere suggestion of deprecating our old friend (even if preceded by an extended period of educating users on the well established replacements). Manuel Limos and Lucas Darnell have both written blog posts indicating what a bad idea this could be. Their issues are understandable. Once the E_DEPRECATION notices start flying applications that have existed for years (and years) will appear to implode leaving behind a long line of irritated people who may need to hire a PHP programmer to fix stuff. This obviously imposes a cash cost across thousands (probably an underestimation
) of businesses. This may lead to hosting services deferring adoption of the PHP version carrying the deprecation by months if not years. Lucas also raised an interesting point that with so much literature, including books, carrying example after example of (often insecure in my opinion) MySQL extension use, user adoption and education may suffer a great deal.
In a riposte to Manual Lemos, Gregg Thomason perhaps illustrates best why even the feared disadvantages may be worth the cost. MySQL is a historical relic from a past PHP is trying to leave behind. It’s old, doesn’t do a lot to support security and it needs to go. I agree. Gregg says “…this is a forward-thinking business and our job is to invent the future.” Let’s go invent and improve that future – if nothing else it might make Anonymous’ job finding SQL injections at every company they squint at a little harder
.
PHP is not a weirdo stagnant programming language used by amateurs who don’t have sufficient brain cells to learn Java, Ruby or Python. That’s the common misconception based largely on two obvious factors: PHP is so amazingly popular and easy to learn that any innocently ignorant person with half a brain cell can write a fabulously insecure application (the examples just keep coming and coming) and, secondly, PHP is a bit on the ugly side and not a “true object oriented language” because it uses functions instead of methods. PHP is actually used by hardcore professionals who build great secure applications and that community has left the original MySQL extension by the wayside in favour of object oriented solutions where MySQL related functions are buried deep behind a wall of classes in their preferred database interaction solution, such as PDO or Doctrine. It’s about time we brought everyone else up to speed with that reality.
While “deprecation” may attract all the attention, let’s remember that pushing the alternatives by any possible means is a great idea. Philip Olson’s proposal on how to encourage users to move away from the original MySQL extension has a lot of merit and is well worth persuing. We need to let go of the past eventually to keep PHP moving into the future.
Zend Framework Contributors Mailing-List Summary; Edition #1 (June 2011)
Jun 27th
What’s this nonsense then? Well, a few weeks ago I shot myself in the foot (I was aiming for the cat who spilled coffee all over my desk) and before my sanity returned to normal, I found myself hoodwinked on IRC into writing up weekly summaries of what is discussed in Zend Framework land. The moral of the story is that the attempted murder of any ungrateful coffee-spilling animals sharing your home never ends well.
Let’s see how good a verbose meandering writer can be at summarising things. I decided to refer to myself by name throughout to avoid confusion.
Discussion Time: In ZF2, where do things go?
Ralph Schindler sprang this topic on us back in April and it has stubbornly continued on ever since. Ralph’s initial question boiled down to where should we put resource files, i.e. files utilised by PHP class files but not written in PHP themselves. The two options presented were to store them relative to the class files inside the library directory or store them in a completely separate parallel directory specifically for resources.
Opinions varied quite a bit and Mike Willbanks opined that we should follow PEAR standards rather doing our own thing and seek to limit include_path performance issues. Matthew Weier O’Phinney noted that include_path performance concerns should be minimal using ZF2′s autoloader solution which he has researched, and the intention was to use PEAR or Pyrus. Pádraic Brady (I know that name from somewhere!) chipped in that any decision ought to be made independent of the packaging used, referencing possible weaknesses in how PEAR handles installation, unit tests and documentation viewing. Ralph responded to clarify possible workings of a separate resource directly using simple constants and allowing users to selectively override this noting the existence of the Assetic project (used by Symfony 2). Kevin McArthur added a vote to avoiding PEAR citing the need for multi-version installation support in a final solution and suggested the PHAR format for consideration.
Short version: Someone will make a choice…eventually
.
How to Package ZF2
Pádraic spawned a new thread from the above earlier topic outlining the options available for packaging source code including PEAR, Pyrus, Git and a Symfony related project (now known as Composer). He also reiterated concerns previously raised regarding PEAR/Pyrus. There followed a side discussion on how individuals were actually deploying applications and managing QA and patches. Matthew raised an objection to the concept of centralised multi-version installs of Zend Framework citing alternative solutions such as deploying applications already containing the Zend Framework version required as easing maintenance and uncertainty. He also asked Kevin McArthur to clarify the use of PHAR. Kevin responded to offer an answer as to why centralised multi-installs were useful citing benefits in minimising the APC cache memory (centralised libraries offering minimal chances of having identical copies being cached), and offering an example bootstrap script for such an architecture to manage version selection.
Matthew also posted responses to points brought up in respect of Pyrus noting, among other things, that it was closer to stable than suspected, that centralised multi-versioning was possibly not as popular as believed, that git support may be possible to add independently, and that XML package definitions had a number of advantages. The debate over centralised multi-version installations of Zend Frameworks continues for a large number of emails without resolution (too much to summarise other than to note each side is firmly divided by the benefits their particular approach and multi-versioning proponents seem more numerous than expected). No concensus was reached over the method of installation with the best summarisations of the respective opinions being emailed in by Matthew and Kevin McArthur. Pádraic chimed in briefly to prompt adoption of PEAR in preference to Pyrus on the basis PEAR is already widely adopted, understood and is easily manipulated at present. This was seconded but there remained a lack of concensus. The topic ends with a suggestive note that adoption of Pyrus may be accepted recognising the absence of another realistic solution at the current time.
Short version: ZF2 may be distributed using Pyrus. Additional needs beyond that may be proposed to PEAR for Pyrus or via another tool. It’s clear Pyrus will be crop up again in a future discussion.
ZF2′s View: Some thoughts for discussion
Pádraic Brady dropped an email offering his thoughts on the direction of ZF2′s View which hadn’t seen huge feedback on the Wiki. The short version was that Zend_View was a God Class, View Helpers were confusing, integration needed improvement and templates needed additional control over layouts/placeholders. He suggested a couple of steps including elimination of the ViewRenderer helper, the replacement of View Helpers with a Controller oriented entity referred to as a “Cell”, ensuring the base template of a View had greater control over the rendering process and reiterated previously agreed changes. Marc Bennewitz added several additional concerns and posted a discussion he had with Matthew on the Zend\View\Variables class. Matthew responded with a number of points including keeping the barrier to entry low, recognising all Views are not HTML, and other areas for consideration. Nice to see everything in one place for discussion.
Short version: Not much in the way of disagreement. Seems like a topic that just needs sufficient code for someone to run off and write some proposals.
Proposal: Don’t implement BC requirement until ZF 2.1
Rob Allen emailed a proposal suggesting that backwards compatibility be deferred as a requirement until ZF 2.1. His reasoning focuses on the experience with ZF 1.0 where the frozen compatibility hurt ZF 1.x more than it helped. The proposal was quickly seconded by Ryan Mauger, Anthony Shireman, Rob Zienert (on condition of communicating this clearly to users), and H. Hatfield. Opposing views were aired by Till Klampaeckel on the grounds of keeping migrations between versions simpler. Tomáš Fejfar commented on this being a psychological proposal to increase adoption and raise feedback before the API is finally frozen. Matthew Weier O’Phinney noted his agreement that bigger features were required to increase early adoption.
Bradley Holt took the opportunity to propose alternative version/release strategies setting the context for the rest of the debate to date. His two points were to a) utilise an odd/even version system where odd numbered minor releases were considered betas and even numbered considered stable, similar to how the Apache HTTP server does things, and b) increase the pace of major releases to shorten the period between allowable compatibility breaks and speed up rolling out such improvements. The debate suggested Rob Allen would agree to faster major releases.
Short version: Implementing BC may be necessary. Might be better to shorten the release cycle and roll out compatibility breaking changes more regularly.
Proposal: Shorter Release Cycle for Major Versions
On the back of the previous topic, Bradley Holt elaborated on a proposal for shortening the release cycle for major versions. Pádraic Brady responded in agreement noting that by the time ZF2 was released, there was a possibility that PHP 5.4 with potentially advantageous features would be well on the way to a 2012 release. Based on this he suggested that ZF3 development could be executed quickly with a release date no later than end of 2012 (i.e. 18 months away) with a maximum allowed period of 2 years. Kevin McArthur inquired into a reasonable minimum period before major releases but this seems to the number needing more discussion. There has been no input from the Zend guys to date so this remains up in the air.
Short version: We want ZF3 relatively quickly and not in 4-5 years time.
Encouraging Usage of ZF 2.0 Beta
Another discussion opener from Bradley Holt. Bradley suggested an extended beta period, a communication campaign, treating all betas as regular GA releases and highlight applications build on ZF2 to encourage uptake. Kevin McArthur reiterated the need to maintain current versioning and noted his agreement to shortening the major version release cycle and having an extended alpha/beta period. Alessandro Pellizzari emailed in his thoughts from the perspective of a user and the difficulties that currently exist with checking the status of any one ZF2 component. Derek Miranda voiced his agreement with Alessandro’s thoughts.
Short version: Maybe we need a beta first?
New dev snapshot released
On the back of the work going into Zend\DI, Matthew announced the release of a new development snapshot for testing and feedback. Ralph Schindler subsequently posted links to Zend\DI examples. Feedback is ongoing. Anyone is free to check it out and offer some opinion!
Short version: Isn’t that short enough?
For those of you wondering where to go and track the inner thoughts of the Zend Framework developers, you can join us on the zf-contributors mailing list (available on Nabble here) or on IRC channel #zftalk.dev on Freenode.net. Until next time, remember, coffee + cat = bad.
How Would You Engineer A PEAR2/Pyrus Distribution Architecture?
Jun 20th
I was recently accused on the Zend Framework Contributors mailing list of having “strong feelings” towards Pyrus (i.e. the PEAR Group’s Installer/Packager for PEAR2) and not in a positive way. It’s a fair description. PEAR is, putting it lightly, a very old architecture which makes it very resistant to change. With the idea of PEAR2 and Pyrus, I had hoped to see a renewal – the advancement of a PEAR architecture for the 21st Century. Instead, and this is just my opinion, PEAR2/Pyrus were a relatively simple iteration on a very old theme.
A Ranting We Shall Go
Now, I may be biased since I gave up on PEAR becoming PHP’s core distribution mechanism after I found myself using alternative strategies for hosting and deployment. This is not to say PEAR is not useful for everyone. It is – just not in my specific case when developing/testing/deploying applications. It still remains a good distribution means regardless by virtue of its ubiquitous installation with PHP.
I surprised even myself, however, with my vehement outcry over the idea of adopting Pyrus as Zend Framework 2′s package distribution method, lambasting both it and the PEAR concept of distribution in equal measures while piling up questions on Pyrus’ status (currently released in alpha) and suitability in the near term. That thread showed a fairly divided sentiment. Once I jokingly threatened to mow down my zombified colleagues with a minigun, I figured it was time to go forth and rant (miniguns are too expensive for these recessionary times).
If the PEAR ecosystem has a failing, it is one of staggered evolution. Over time it has picked up additional features tacked on top of a base model. The classic example is the use of Channels (to support multiple repositories) that has more recently prompted calls for the use of a Channel Aggregator to avoid the use cost in locally managing a channel registry or even hosting a Channel. This is the way of many PEAR features. They each do something incredibly useful but do it in a way that has many developers looking for a better approach – usually to discover the better approach requires breaking compatibility.
My vehemence in the afore mentioned mailing list was down to a simple case of disappointment. We all deal with PEAR because we have it, we know it, and have done so for years. Seeing PEAR2 and Pyrus take the incremental improvement route without apparently doing anything to change the core experience seemed…pointless. It improved a lot of what PEAR already did without actually doing very much different. All the same advantages, disadvantages, features and lack thereof were present and accounted for with a handful of nice headline changes (e.g. we now have package signing capability). What exactly was the purpose of rewriting the entire toolchain if not to seize the opportunity to answer the accusations of those who doubt PEAR is even relevant these days – by making it the single most relevant development in PHP today?
One Possible Path Forward
Since this is a brain dump post, as much to gather my own throughts in one place as anything else, feel free to call me bat shit crazy. There are days even I think that. Below I’ve raised what I perceive as problems in the PEAR/Pyrus system, obviously from a personal perspective, and possible solutions under the categories of Packaging, Distribution, Installation and Usage. I’ve tried to avoid getting into technical details – broad strokes will suffice for now. For your sanity, only the Packaging and Distribution areas are presented today. I will add a similar post for Installation and Usage later in the week. First one to mention “TL:DR” gets a minigun round to the head (will have to make do with throwing it at you until I can scrape more cash together for the hardware). To avoid any confusion, I use the terms PEAR and Pyrus to refer to the entire workflow from package generation to end usage for each respectively.
Packaging
The packaging of source code for PEAR is performed using the PEAR/Pyrus Installer coupled with a Package Definition (i.e. package.xml) to create a distributable archive file. Pyrus utilises a slightly more friendly Package Definition by also allowing for some elements of the definition to be defined in files other than package.xml (e.g. for setting up a changelog file or version numbers). The basic goal of this Package Definition is to have at least one XML file which tells PEAR/Pyrus which files to package, while role a file has (code/docs/tests), where each file goes in a relative filesystem, optionally the file’s MD5 hash, and a set of metadata like the package name, changelog, version, dependencies, etc. Using Pyrus offers the additional feature of being able to cryptographically sign packages, use a larger number of archive formats including PHAR, and bundle certain package dependencies internally.
Problems:
The main problem with the current Package Definition is that it often must be generated by a separate tool since it’s XML (it’s that thing everyone used before discovering YAML/JSON), and must explicitly list every file and piece of data within that format (with the exception of Pyrus which allows specifically formatted files to carry version and changelog information among other nuggets) optionally with each file’s digest hash. Even the Pyrus improvements still require specific files using specific formatted text and/or file names. Using XML just ends up imposing extra work to maintain package details unless you are lucky enough to have a small stable enough package.xml that it can be manually maintained rather then persistently needing generation. A minor aesthetic detail is that XML is harder to read.
Secondly, packages are therefore bound to their archiving restraints. Since package.xml generation is tied to a secondary process, installing from source code may not be feasible whether performed on a local git clone or similarly automated from a remote source where the remote package.xml may well be out of sync with the actual source code or where it may not even exist.
Possible Solutions:
The one solution that keeps occuring to me is to simply make a Package Definition programmable, i.e. a small consumable low-maintenance PHP script. Using native PHP, one can create ether a generic array, or a newfangled closure, which can be executed through PHP to populate all the necessary data for a Package Definition for consumption by a package installer.
Since I’ve dabbled a bit, here’s what such a Package Definition could look like:
<?php $package = function ($s) { $s->name = 'Overlord'; $s->authors = 'Padraic Brady, Sauron[sauron@mordor.me]'; $s->version = '0.0.1-dev'; $s->api_version = '0.0.1-dev'; $s->summary = 'Monitoring library for Hobbit Detector 1.0'; $s->description = file_get_contents(__DIR__ . '/description.txt'); $s->homepage = 'http://en.wikipedia.org/wiki/Sauron'; $s->changelog = file_get_contents(__DIR__ . '/changelog.txt'); $s->files['php'][] = 'library/**/*.php'; $s->files['tests'][] = 'tests/**/*.*'; $s->files['ignore'][] = '*.project'; $s->files['bin'][] = 'scripts/overlord.bat'; $s->include_path = 'Overlord/Monitor/'; $s->dependencies[] = 'PHP[>=5.3.1]'; $s->dependencies[] = 'Pear[>=1.6.5]'; $s->dependencies[] = 'MutateMe[0.5.0]'; $s->dependencies[] = 'ext/runkit'; $s->optional_dependencies[] = 'ext/eyeofsauron'; $s->license = 'New BSD'; };
I’ll assume PHP 5.4 will have some sort of short array notation to cut down the array size. Well, let’s hope so
. Would be nice to reduce the line count more. Yes, I did indeed borrow the idea from elsewhere
.
This has a few advantages. No XML to maintain. No need to keep an XML Package Definition synced up for every file change in a VCS. No need for secondary XML generation tools or build tool plugins. Supports downloading files from remote herarchical sources and not just archives (including any VCS source). Developers are already used to versioning build scripts from tools like Phing (just not the end products which are usually ignored whereas package.xml is not). Being plain old PHP, it can be just as complex or as minimal as you want and anyone with basic PHP knowledge can write one.
One can still generate signable archive files using this approach – the point is to increase the kind of installation sources that can be used rather than replace existing ones. In place of signable packages, for those requiring the security, other package files could be limited to download over HTTPS. For example, Github offers git read-only access via HTTPS for all repositories as standard.
Distribution
In order to distribute source code using PEAR/Pyrus, you need to make use of either a PEAR Channel or a standalone archive download (i.e. a downloadable tarball). A Channel is basically a whole bunch of XML files served up for access as a REST API. Using a Channel, you can upload packages to the Channel host, update the XML files, and publicise your Channel URI so users can discover your Channel and install your packages.
Problems:
PEAR (PHP Extension and Application Repository) was originally founded to serve as a central package distribution channel. For various real and imagined reasons, the concept of a central repository did not succeed in PHP and instead developers insisted on using alternative means. This was aggravated even further by the arrival of frameworks like Zend Framework offering discrete components not originally served over a PEAR Channel at all. PEAR Channels were introduced to allow anyone host their own distinct PEAR Channel as one of those means.
The PEAR Installer has only ever shipped with the main PEAR Channels pre-registered. All other Channels needed to be manually located before use – usually by referring to the packaged library’s documentation. Since all Channels are independent entities, there is no global lookup point for querying package details, dependencies and availability. There is also no scope for true package name uniqueness (technically this is accomplished by requiring all packages (except core-PEAR ones) are prefixed with a Channel alias term, e.g. mychannel/MyPackage).
The generation of the REST API, which was the backbone of a Channel, was also complex (the release of Pirum by Fabien Potencier has gone a long way towards simplifying this). Obviously, Channels are also tied to the concept of archive packages and cannot operate directly with a VCS like git. There is a workaround possible for Github using Github Pages to host the REST API.
As alluded to, the REST API is itself a complex graph of XML files that requires a generation tool to manage initial setup and package updates.
Possible Solutions:
The best concept to gain early traction was that of a Channel Aggregator expressed by Stuart Herbert. Sadly, I haven’t seen much more action on that front. In commenting on that idea, I considered it a move towards a decentralised distributed Channel mechanism (mouthful of gibberish, I know!). Here’s a couple of thoughts on how this could work:
The players would include a Package Authority, a Channel Aggregator (any number of them), and Channels (optional).
The Package Authority would be a centralised location basically for reserving package names and ensuring there is a point of reference and authority to prevent package name duplication and to manage ownership of such. It’s possible this could also be developed with additional purposes but let’s keep it simple. This would help, primarily, in removing the need for Channel prefixes on package names and preventing package name confusion. For security reasons, the Package Authority would associate a package name to a specific URI representing a download source (e.g. a PEAR Channel or Git URI)
Channel Aggregators are the more complex beasts. They may be utilised by Channel operators to distribute Package metadata to end-users on demand. The Aggregator would track available packages at source, their basic details, their available versions, and information on the location of host Channels, version control systems, and Package URIs and so forth. In effect, the Aggregator might well replace Channels for many purposes – and potentially eliminate one more source of work in distributing source code using PEAR/Pyrus.
The ideal scenario here is that any PEAR/Pyrus Installer would pre-register a couple of well-maintained Aggregators saving the users and package distributors the annoyance of dealing with Channels altogether. Hence, we’re back to a core Channel of sorts but with control of package/source hosting decentralised to individual developers. Again, Aggregators could easily repurpose themselves as package hosters if they wish (such as Pearfarm are doing) though this would be entirely optional.
Channels, as suggested, could well be optional. Use an Aggregator instead and register either a package URI, git repository, or anything else so long as it lets you download the package files (and the PHP programmable Package Definition
). Painless hosting? Maybe.
I will point out this would require at least one authentication in the system. You’d need a Package Authority account to allow for reserving a package name and perhaps transferring it between maintainers. The Aggregator may operate without authentication since it acts much like any aggregator based on your source data (and one would hope a few simple crosschecks with the Package Authority to ensure it’s not unwittingly aggregating false data from the hackers
. Package/source hosters could ping the Aggregator as a hint to update its date in a more timely manner.
I won’t touch the issue of who gets to reserve the package name “DB”. The Package Authority may need to enforce specific rules against overly generic names on a common sense basis.
I think that’s enough for a Monday read (you’ll all need enough brain capacity to finish out the week!). Feedback is, as usual, welcome. If anyone has a pre-existing solution or one in planning along these or similar lines, drop a comment!









