PHP, Zend Framework and Other Crazy Stuff
Zend_Feed_Reader promoted to Zend Framework trunk (watch out for ZF 1.9!)
I’m happy to say that Zend_Feed_Reader has been made it through its two week cleanup effort and emerged from Matthew’s review to join the Zend Framework trunk. Once the Zend Framework 1.9 release process spins up I look forward to more feedback how this component has turned out. Thanks to Jurriën Stutterheim (my co-conspirator), Matthew Weier O’Phinney for shepherding this through, and a special mention goes to Kawsar Saiyeed whose feedback while using Zend_Feed_Reader to build a feed aggregator over the past week was invaluable.
Zend_Feed_Reader grew out of my need to have something that is not just capable of reading feeds, but was capable of understanding and interpreting them. If you’ve used Zend_Feed, you know that getting something simple like content, or a creation date, is a task that requires a bit of work. Feeds come in three distinctly different forms: RSS, RDF/RSS and Atom, all with multiple versions. Each has it’s own way of presenting information. Each can also utilise extensions like RSS’s popular Dublin Core 1.1 module or Atom’s Threaded Extensions RFC. Getting a simple point of data can mean sifting through feeds to see what type and version they are, what elements to look for, what alternatives exist, and what alternatives should be prioritised over others. It’s work that has led developers to write long classes designed to handle the task. Zend_Feed is also not without its flaws. Its API is inconsistent, its namespace handling questionable, and extending it is not as easy as it looks.
So, is Zend_Feed_Reader any different? Well, I hope so
. But you’ll have to dig a little deeper to see it. Here’s a quick example from the documentation showing off the API briefly.
[geshi lang=php]$feed = Zend_Feed_Reader::import(‘http://www.planet-php.net/rdf/’);
$data = array(
‘title’ => $feed->getTitle(),
‘link’ => $feed->getLink(),
‘dateModified’ => $feed->getDateModified(),
‘description’ => $feed->getDescription(),
‘language’ => $feed->getLanguage(),
‘entries’ => array(),
);
foreach ($feed as $entry) {
$edata = array(
‘title’ => $entry->getTitle(),
‘description’ => $entry->getDescription(),
‘dateModified’ => $entry->getDateModified(),
‘author’ => $entry->getAuthor(),
‘link’ => $entry->getLink(),
‘content’ => $entry->getContent()
);
$data['entries'][] = $edata;
}[/geshi]
Here’s a similar example using Zend_Feed (based on its manual Introduction).
[geshi lang=php]Zend_Feed::import(‘http://rss.slashdot.org/Slashdot/slashdot’);
$channel = array(
‘title’ => $slashdotRss->title(),
‘link’ => $slashdotRss->link(),
‘description’ => $slashdotRss->description(),
‘items’ => array()
);
foreach ($slashdotRss as $item) {
$channel['items'][] = array(
‘title’ => $item->title(),
‘link’ => $item->link(),
‘description’ => $item->description()
);
}[/geshi]
Wow, I hear you say, that is so different! Bloody marvellous! Stow the sarcasm as I explain a bit further though
.
Zend_Feed_Reader and Zend_Feed are siblings – there is little doubt there. But what the example doesn’t show is what happens behind the scenes. Zend_Feed’s API allows you to run literal queries against the underlying XML. Essentially, the API is tied to the structure of the feed’s XML document making Zend_Feed a class with a mutable API. A description() method looks for description elements, in other words. The API is tied to element names, not the concept of what you’re trying to extract. This is great when RSS and Atom are kind enough to agree on a description element in their respective namespaces, but it goes off course when they don’t. For example, RSS 2.0 has the pubDate element whereas Atom 1.0 has the created and modified elements. What does this mean with Zend_Feed?
[geshi lang=php]$feed = Zend_Feed::import(‘http://www.example.com/feed/’);
$entry = $feed->current();
$dateModified = $entry->pubDate();
if (!$dateModified) {
$dateModified = $entry->published();
}
if (!$dateModified) {
$dateModified = $entry->modified();
}
if (!$dateModified) {
$dateModified = $entry->updated();
}
if (!$dateModified) {
$dateModified = $entry->created();
}
// Oh, crap. I forgot about Dublin Core… (there’s DC 1.0 and DC 1.1)
// Load up Zend_Date and some detection so we can normalise these dates after[/geshi]
With Zend_Feed, we need to run a few (7 or so) literal requests – we could add in Feed type detection but that would mean we’d be moving towards a Strategy Pattern (watch the code grow then, and the unit tests). Finally, all feed types vary in how they present dates. These would have to be normalised from RFC822, RFC2822, ISO 8601 or that W3C standard. Assuming the feeds used one of these. Now rinse and repeat for anything you want from that feed…and add unit tests to ensure it all works for various feed types and versions. Ouch! You will also need to learn RSS and Atom in detail along with their common use extensions like Dublin Core, Atom Threaded, yada, yada, yada…
Zend_Feed_Reader, on the otherhand, is not a gateway to the feed’s DOM. It just offers a simple lean API which is identical for every type and version of feed.
[geshi lang=php]$feed = Zend_Feed_Reader::import(‘http://www.example.com/feed/’);
$entry = $feed->current();
$dateModified = $entry->getDateModified();[/geshi]
In this case, the $dateModified result is a Zend_Date object. You can format the date any way you want from here. Behind the scenes, getDateModified() does a ton of work (all unit tested) to decipher the current feed and locate the data you seek from all the possible alternatives that might exist. Once it’s located, it’s imported into Zend_Date using the correct standard.
Of course, this all means you’ll have to endure a tiny easy to memorise API since you can only call getDateModified() – not pubDate(), created(), etc. Sorry for that. You’ll also have to put up with the extra 660+ unit tests Zend_Feed_Reader has added to the Zend Framework suite to cover every feed type and version, every common combination of versions and XML extensions, and all the niggly things like normalisation and consistent API returns. My apologies. We even fantasised that RSS might implement the entire Atom 1.0 spec as an RSS module. Insanity.
The result is that we have two sets of methods. One operates at the feed level, the other at the entry level. Once you boil down all the possible alternative elements, that means we implement 12 methods at the feed level, and 13 methods at the entry level (seriously, that is all you need in most cases!). We also provide methods for accessing the current object’s DOMElement, DOMDocument and DOMXPath objects for those times you need something not provided by the current API, or you can use the Extension system (see below).
As for comparing the internal workings, Zend_Feed_Reader uses only one method for querying feeds. XPath. XPath might seem a surprising choice but, being a lazy programmer, it was the option that required the least code, made it easy to see what methods did internally since XPath queries (if you speak the query language) are easy to follow, and assisted in enforcing a uniform approach to parsing feeds. This means Zend_Feed_Reader may be (and probably is) slower than Zend_Feed. Remember we also sift across alternatives to get what you’re looking for – so it could be many XPath queries per API call. We’ve added internal class caches so repetitive queries can skip XPath, and you can set your own persistent cache, to improve performance overall by minimising network requests. Zend_Feed_Reader also supports HTTP 1.1 Conditional GET requests to save on bandwidth, processing of unchanged feeds and network use.
Internally, Zend_Feed_Reader also implements (it’ll be much improved later after some refactoring) a plugin system where you can write custom “Extensions” that add methods to query feeds or entries for RSS/Atom module data not already bundled with Zend_Feed_Reader. This will cover cases where a custom RSS or Atom module (for example, weather, podcasting or geolocation data) is included in a feed and you want access to that data easily. I hope to bundle some of the more popular standards with Zend_Feed_Reader – but I’ll see what time I have left before ZF 1.9.
In the meantime – enjoy the new component. It’s led a quiet existence to date so I’m eager to get more feedback on usage. Kawsar has allegedly sacrificed fingers to complete massive emails containing feedback. Personally, the most invaluable QA tool available at this point in time are warm bodies sitting at a desk, using this component, and complaining to me when it doesn’t do what they expected!
Now, I only covered some basics – for a full overview of what Zend_Feed_Reader does and is capable of you should read the documentation. It’s currently not in the HTML online manual, but the Docbook XML source is still quite readable:
http://framework.zend.com/svn/framework/standard/trunk/documentation/manual/en/module_specs/Zend_Feed_Reader.xml
Related posts:
| Print article | This entry was posted by Pádraic Brady on July 15, 2009 at 11:34 am, and is filed under PHP General, PHP Security, Zend Framework. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |
-
http://pooteeweet.org Lukas
-
Jan
-
Rob
-
http://vinnl.nl Vincent
-
10us
-
http://lukevisinoni.com/ Luke Visinoni
-
http://blog.wolff-hamburg.de/ Markus Wolff
-
http://weirdan.livejournal.com Bruce Weirdan
-
http://weirdan.livejournal.com Bruce Weirdan
-
http://www.survivingthedeepend.com Padraic
-
http://www.survivingthedeepend.com Padraic
-
http://www.z-f.fr miboo
-
simon
-
http://www.siteartwork.de Thorsteb
-
http://kowalikus.pl Kowalikus
