Clapps Favorite pear, British Colombia, Canada...

Image via Wikipedia

I was recently accused on the Zend Framework Contributors mailing list of having “strong feelings” towards Pyrus (i.e. the PEAR Group’s Installer/Packager for PEAR2) and not in a positive way. It’s a fair description. PEAR is, putting it lightly, a very old architecture which makes it very resistant to change. With the idea of PEAR2 and Pyrus, I had hoped to see a renewal – the advancement of a PEAR architecture for the 21st Century. Instead, and this is just my opinion, PEAR2/Pyrus were a relatively simple iteration on a very old theme.

A Ranting We Shall Go

Now, I may be biased since I gave up on PEAR becoming PHP’s core distribution mechanism after I found myself using alternative strategies for hosting and deployment. This is not to say PEAR is not useful for everyone. It is – just not in my specific case when developing/testing/deploying applications. It still remains a good distribution means regardless by virtue of its ubiquitous installation with PHP.

I surprised even myself, however, with my vehement outcry over the idea of adopting Pyrus as Zend Framework 2′s package distribution method, lambasting both it and the PEAR concept of distribution in equal measures while piling up questions on Pyrus’ status (currently released in alpha) and suitability in the near term. That thread showed a fairly divided sentiment. Once I jokingly threatened to mow down my zombified colleagues with a minigun, I figured it was time to go forth and rant (miniguns are too expensive for these recessionary times).

If the PEAR ecosystem has a failing, it is one of staggered evolution. Over time it has picked up additional features tacked on top of a base model. The classic example is the use of Channels (to support multiple repositories) that has more recently prompted calls for the use of a Channel Aggregator to avoid the use cost in locally managing a channel registry or even hosting a Channel. This is the way of many PEAR features. They each do something incredibly useful but do it in a way that has many developers looking for a better approach – usually to discover the better approach requires breaking compatibility.

My vehemence in the afore mentioned mailing list was down to a simple case of disappointment. We all deal with PEAR because we have it, we know it, and have done so for years. Seeing PEAR2 and Pyrus take the incremental improvement route without apparently doing anything to change the core experience seemed…pointless. It improved a lot of what PEAR already did without actually doing very much different. All the same advantages, disadvantages, features and lack thereof were present and accounted for with a handful of nice headline changes (e.g. we now have package signing capability). What exactly was the purpose of rewriting the entire toolchain if not to seize the opportunity to answer the accusations of those who doubt PEAR is even relevant these days – by making it the single most relevant development in PHP today?

One Possible Path Forward

Since this is a brain dump post, as much to gather my own throughts in one place as anything else, feel free to call me bat shit crazy. There are days even I think that. Below I’ve raised what I perceive as problems in the PEAR/Pyrus system, obviously from a personal perspective, and possible solutions under the categories of Packaging, Distribution, Installation and Usage. I’ve tried to avoid getting into technical details – broad strokes will suffice for now. For your sanity, only the Packaging and Distribution areas are presented today. I will add a similar post for Installation and Usage later in the week. First one to mention “TL:DR” gets a minigun round to the head (will have to make do with throwing it at you until I can scrape more cash together for the hardware). To avoid any confusion, I use the terms PEAR and Pyrus to refer to the entire workflow from package generation to end usage for each respectively.

Packaging

The packaging of source code for PEAR is performed using the PEAR/Pyrus Installer coupled with a Package Definition (i.e. package.xml) to create a distributable archive file. Pyrus utilises a slightly more friendly Package Definition by also allowing for some elements of the definition to be defined in files other than package.xml (e.g. for setting up a changelog file or version numbers). The basic goal of this Package Definition is to have at least one XML file which tells PEAR/Pyrus which files to package, while role a file has (code/docs/tests), where each file goes in a relative filesystem, optionally the file’s MD5 hash, and a set of metadata like the package name, changelog, version, dependencies, etc. Using Pyrus offers the additional feature of being able to cryptographically sign packages, use a larger number of archive formats including PHAR, and bundle certain package dependencies internally.

Problems:

The main problem with the current Package Definition is that it often must be generated by a separate tool since it’s XML (it’s that thing everyone used before discovering YAML/JSON), and must explicitly list every file and piece of data within that format (with the exception of Pyrus which allows specifically formatted files to carry version and changelog information among other nuggets) optionally with each file’s digest hash. Even the Pyrus improvements still require specific files using specific formatted text and/or file names. Using XML just ends up imposing extra work to maintain package details unless you are lucky enough to have a small stable enough package.xml that it can be manually maintained rather then persistently needing generation. A minor aesthetic detail is that XML is harder to read.

Secondly, packages are therefore bound to their archiving restraints. Since package.xml generation is tied to a secondary process, installing from source code may not be feasible whether performed on a local git clone or similarly automated from a remote source where the remote package.xml may well be out of sync with the actual source code or where it may not even exist.

Possible Solutions:

The one solution that keeps occuring to me is to simply make a Package Definition programmable, i.e. a small consumable low-maintenance PHP script. Using native PHP, one can create ether a generic array, or a newfangled closure, which can be executed through PHP to populate all the necessary data for a Package Definition for consumption by a package installer.

Since I’ve dabbled a bit, here’s what such a Package Definition could look like:

  1. <?php
  2.  
  3. $package = function ($s) {
  4.     $s->name = 'Overlord';
  5.     $s->authors = 'Padraic Brady, Sauron[sauron@mordor.me]';
  6.     $s->version = '0.0.1-dev';
  7.     $s->api_version = '0.0.1-dev';
  8.     $s->summary = 'Monitoring library for Hobbit Detector 1.0';
  9.     $s->description = file_get_contents(__DIR__ . '/description.txt');
  10.     $s->homepage = 'http://en.wikipedia.org/wiki/Sauron';
  11.     $s->changelog = file_get_contents(__DIR__ . '/changelog.txt');
  12.     $s->files['php'][] = 'library/**/*.php';
  13.     $s->files['tests'][] = 'tests/**/*.*';
  14.     $s->files['ignore'][] = '*.project';
  15.     $s->files['bin'][] = 'scripts/overlord.bat';
  16.     $s->include_path = 'Overlord/Monitor/';
  17.     $s->dependencies[] = 'PHP[>=5.3.1]';
  18.     $s->dependencies[] = 'Pear[>=1.6.5]';
  19.     $s->dependencies[] = 'MutateMe[0.5.0]';
  20.     $s->dependencies[] = 'ext/runkit';
  21.     $s->optional_dependencies[] = 'ext/eyeofsauron';
  22.     $s->license = 'New BSD';
  23. };

I’ll assume PHP 5.4 will have some sort of short array notation to cut down the array size. Well, let’s hope so ;) . Would be nice to reduce the line count more. Yes, I did indeed borrow the idea from elsewhere ;) .

This has a few advantages. No XML to maintain. No need to keep an XML Package Definition synced up for every file change in a VCS. No need for secondary XML generation tools or build tool plugins. Supports downloading files from remote herarchical sources and not just archives (including any VCS source). Developers are already used to versioning build scripts from tools like Phing (just not the end products which are usually ignored whereas package.xml is not). Being plain old PHP, it can be just as complex or as minimal as you want and anyone with basic PHP knowledge can write one.

One can still generate signable archive files using this approach – the point is to increase the kind of installation sources that can be used rather than replace existing ones. In place of signable packages, for those requiring the security, other package files could be limited to download over HTTPS. For example, Github offers git read-only access via HTTPS for all repositories as standard.

Distribution

In order to distribute source code using PEAR/Pyrus, you need to make use of either a PEAR Channel or a standalone archive download (i.e. a downloadable tarball). A Channel is basically a whole bunch of XML files served up for access as a REST API. Using a Channel, you can upload packages to the Channel host, update the XML files, and publicise your Channel URI so users can discover your Channel and install your packages.

Problems:

PEAR (PHP Extension and Application Repository) was originally founded to serve as a central package distribution channel. For various real and imagined reasons, the concept of a central repository did not succeed in PHP and instead developers insisted on using alternative means. This was aggravated even further by the arrival of frameworks like Zend Framework offering discrete components not originally served over a PEAR Channel at all. PEAR Channels were introduced to allow anyone host their own distinct PEAR Channel as one of those means.

The PEAR Installer has only ever shipped with the main PEAR Channels pre-registered. All other Channels needed to be manually located before use – usually by referring to the packaged library’s documentation. Since all Channels are independent entities, there is no global lookup point for querying package details, dependencies and availability. There is also no scope for true package name uniqueness (technically this is accomplished by requiring all packages (except core-PEAR ones) are prefixed with a Channel alias term, e.g. mychannel/MyPackage).

The generation of the REST API, which was the backbone of a Channel, was also complex (the release of Pirum by Fabien Potencier has gone a long way towards simplifying this). Obviously, Channels are also tied to the concept of archive packages and cannot operate directly with a VCS like git. There is a workaround possible for Github using Github Pages to host the REST API.

As alluded to, the REST API is itself a complex graph of XML files that requires a generation tool to manage initial setup and package updates.

Possible Solutions:

The best concept to gain early traction was that of a Channel Aggregator expressed by Stuart Herbert. Sadly, I haven’t seen much more action on that front. In commenting on that idea, I considered it a move towards a decentralised distributed Channel mechanism (mouthful of gibberish, I know!). Here’s a couple of thoughts on how this could work:

The players would include a Package Authority, a Channel Aggregator (any number of them), and Channels (optional).

The Package Authority would be a centralised location basically for reserving package names and ensuring there is a point of reference and authority to prevent package name duplication and to manage ownership of such. It’s possible this could also be developed with additional purposes but let’s keep it simple. This would help, primarily, in removing the need for Channel prefixes on package names and preventing package name confusion. For security reasons, the Package Authority would associate a package name to a specific URI representing a download source (e.g. a PEAR Channel or Git URI)

Channel Aggregators are the more complex beasts. They may be utilised by Channel operators to distribute Package metadata to end-users on demand. The Aggregator would track available packages at source, their basic details, their available versions, and information on the location of host Channels, version control systems, and Package URIs and so forth. In effect, the Aggregator might well replace Channels for many purposes – and potentially eliminate one more source of work in distributing source code using PEAR/Pyrus.

The ideal scenario here is that any PEAR/Pyrus Installer would pre-register a couple of well-maintained Aggregators saving the users and package distributors the annoyance of dealing with Channels altogether. Hence, we’re back to a core Channel of sorts but with control of package/source hosting decentralised to individual developers. Again, Aggregators could easily repurpose themselves as package hosters if they wish (such as Pearfarm are doing) though this would be entirely optional.

Channels, as suggested, could well be optional. Use an Aggregator instead and register either a package URI, git repository, or anything else so long as it lets you download the package files (and the PHP programmable Package Definition ;) ). Painless hosting? Maybe.

I will point out this would require at least one authentication in the system. You’d need a Package Authority account to allow for reserving a package name and perhaps transferring it between maintainers. The Aggregator may operate without authentication since it acts much like any aggregator based on your source data (and one would hope a few simple crosschecks with the Package Authority to ensure it’s not unwittingly aggregating false data from the hackers ;) . Package/source hosters could ping the Aggregator as a hint to update its date in a more timely manner.

I won’t touch the issue of who gets to reserve the package name “DB”. The Package Authority may need to enforce specific rules against overly generic names on a common sense basis.

I think that’s enough for a Monday read (you’ll all need enough brain capacity to finish out the week!). Feedback is, as usual, welcome. If anyone has a pre-existing solution or one in planning along these or similar lines, drop a comment!

Enhanced by Zemanta

Related posts:

  1. Wishing For A PEAR Channel Aggregator? Yes, Please!
  2. Doing that thing called PEAR: Packaging Source Code for PEAR Distribution
  3. PEAR OpenID support packages released
  4. OpenID 2.0 Library – to PEAR, Zend or both?
  5. An Example Zend Framework Blog Application – Part 2: The MVC Application Architecture