Archive for March, 2007
YAML is a machine parsable data serialisation format for storing text, numerical data, arrays and more. It was designed for use with programming languages and has excellent support in Python, Perl and Ruby. Several C implementations exist and at least one (Syck) supports a PHP extension. If that explanation escapes you it’s a bit like having XML, only without the tag soup where nesting is related to indentation.
Now PHP has been slow (really slow) at adopting YAML in a serious way. As I previously noted PHP’s support is limited to the PHP Spyc Library, and a PHP extension which relies on the Syck C Library. The PHP extension has no PECL presence so it’s visibility to PHP developers isn’t the best but it is by far the fastest and more efficient YAML parser for PHP which benefits from the fact that the Syck Library is standard in Ruby since 1.8, and available for Perl (check CPAN) and Python.
So isn’t it time there was something native to PHP to allow everyone work with the YAML format? Well, I think so… So last week I ran off and did some digging, coding, and sacrificed a few white pigeons to the dark gods of inspiration. Once the pigeon blood and other…eh…bits were mopped up, I wrote the proposal for the Zend Framework – read it here. It’s nothing fancy, just a general outline and some sample use cases.
In the meantime I spent some time going over the cool pyyaml reference parser. Zend_Yaml will take it’s cue from pyyaml, with a pinch of personal OO improvement (not a Python “import from *” fanboy I’m afraid – too many methods floating between classes!), and a dash of PHP5 for added flavour. Keeping track of indentation will be painful to say the least – there lots of column, pointer, index messiness to get confused with but in general it’s straightforward enough. Just time consuming .
It’s funny sometimes how you end up being involved in an open source contribution .
A few days ago I was on the PHP Developer’s Network forums – not so surprising since I like being involved there and the problems you run across can get you thinking about new ideas and learning new things. I was responding to a theory post about adding a filter/validation chain to the Zend Framework and suggested the end result should be configurable – i.e. if I have a configuration file, a Factory class should be able to easily generate a filter chain without manual tinkering (something I absolutely hate doing is manually setting up filters and validation since it’s a pain in the ass to modify by hand later). In any case being configurable would be a good pointer to the simplicity (KISS) of use in setting up a chain if nothing else.
Since I’m used to the format from Ruby and Perl and since it’s used for a similar purpose in the PHP Symfony framework, I also suggested using YAML configuration files since they are far more readable than XML and remain machine readable. I believe this went over quite well. Later on I realised the Zend Framework has no YAML support, and after some searching noticed the only (apparent) PHP parser was the Spyc library. Now kudos to the Spyc developers because it works – but it’s not a full grammer parser (that I could tell) so I’m not sure exactly how compliant with the full YAML 1.1 specification it is. I haven’t tested it so I’m not judging.
So I fired off a few emails to the fw-general mailing list of the Zend Framework to see if there was any interest. I’d like to thank Matthew and Gavin for their feedback and I’ve since decided to have a go at implementing a YAML parser in PHP.
Now before folk go off the deepend, there is a PHP5 extension which is based on a YAML parser and emitter in C called “Syck”. Those of you in other languages are likely vaguely familiar with the Syck bindings for Ruby, Python, Perl, etc. But since YAML is overlooked in PHP to such a huge extent it’s pretty hard to pin it down for PHP5. Gavin Vess posted a link to Gentoo Linux in a reply which shows it’s available and being maintained.
So, YAML Parser – where to start? YAML has a few defining format characteristics – it’s so useable because it’s written in a typical western style, top to bottom and left to right. Nesting of YAML nodes is based on indentation levels (likely the main “problem issue” of sorts). In addition the syntax is fairly simple to learn though those single characters call for having a reference card posted over your PC unless you’re a total YAML fiend.
A quick look around the web at the most recent YAML parsers pretty much confirms that LL(1) parsers have gained widespread adoption for YAML. It’s easy to see why since the grammer set is short and an LL(1) parser (recursive descent) is simple enough to implement and test. I see no reason not to take a similar approach. I’m going to dig around at putting together a Parser structure over the weekend – something to get the ball rolling and provide fodder for a formal proposal.