PHP, Zend Framework and Other Crazy Stuff
Posts tagged atom
Zend_Feed_Writer and Zend_PubSubHubbub In Proposal Queue
Jul 19th
I have a few proposals with Zend Framework. I also have an established record of not finishing them very reliably, yay! Ok, so that’s not a good thing. I seem to have established a weird tradition of finding myself in just the right scenario at the completely wrong time to hold me up. Luckily (it’s a bit like thinking THIS year will see a real Summer in Ireland), I do have oddles of time to burn right now. The ZF book is finally climbing Reboot Hill (the translated versions are progressing, and I will get to the next chapter soon - lots happened in the ZF since the drafts were written). Zend_Feed_Reader is in the ZF trunk, and definitely will be in 1.9. Zend_Oauth has been reviewed by me and patches are incoming in the next day to finish it (that should make it’s way into 1.10 along with Zend_Crypt).
Before this all goes sour and I a) wind up hospitalised after a freak accident involving Stephen Fry running me down while sending a Tweet on an iphone, b) see Eircom attacked by mysterious hackers intent on the downfall of Brian Cowen because he shows his appreciation for free speech by introducing legislation with the new criminal act of “blasphemy” defined or c) I’m abducted by Kerrymen (just because…well…they’re all mad down there), I’ve started pounding frantically on my keyboard for a new proposal called Zend_Feed_Writer.
ZFW is the counterpart to Zend_Feed_Reader (as if that wasn’t obvious). It’s purpose is to, once again, offer an alternative to the current Zend_Feed component using similar principles to those applied in Zend_Feed_Reader:
1. Use a simple, intuitive and limited API to eliminate guesswork and uncertainty.
2. Utilise PHP’s DOM to handle the complex internal construction of feeds.
3. Adhere to standards: RSS 2.0 (based on the RSS Advisory Board 2.0.11 spec) and Atom 1.0 (RFC 4287)
4. Behind the scenes, implement support for commonly used RSS/Atom modules like Dublin Core/Slash/Atom Threads
5. Allow users to implement/register Extensions (i.e. plugins) to add support for other modules
6. Liberally throw tests at every conceivable (including the possibly insane) scenario for use.
Just like Zend_Feed_Reader, there is obviously a question mark. Do we even need an alternative to Zend_Feed? ZFR had the major advantage that it was conceived of not only as a major simplification making developer’s lives significantly easier, but also as something that understood feeds - it was able to sift through numerous alternatives for commonly queried data points until it found a match, removing the need for developers to take on that role with custom abstraction layers and interpretive work. Zend_Feed_Writer does something similar, only in reverse. It creates feeds based on the most commonly inputted data points which contain the most logical or specification defined elements. It removes the guesswork, the need to cram up on the RSS/Atom standards, the specifications and the specs for all the different modules used. It eliminates work - and that’s always been the main goal. If you want to learn a bit more - look at the Zend Framework 1.9 Preview, and compare the documentation for Zend_Feed with that for Zend_Feed_Reader. It should highlight where both diverge in a meaningful way.
I’m currently drafting the Zend_Feed_Writer proposal over at Zend_Feed_Writer - Padraic Brady.
Which brings us to upcoming proposal number two, the colourfully named Zend_PubSubHubbub! Or Zend_Feed_PubSubHubbub depending on which name is most appropriate.
To explain, PubSubHubbub defines a protocol where any subscriber (that’s you) can subscribe to a hub server which notifies you when any feed you want to follow has changed. Wait for it… Publishers, the origin of the RSS/Atom feeds you want to follow, can then notify their hub server which in turn notifies you. It’s a push system. Publisher adds new content, and notifies Hub immediately. The Hub can then push the update to you right away. There is no polling every 30 mins for new content - it’s delivered to you, or you are told where to fetch it from, right away. It eliminates the delay between source and polling, the lack of which has given services like Twitter a major advantage in getting news/articles/videos out to hundreds of people almost instantaneously and seen traditional feeds lose some of their attraction. As the PubSubHubbub team put it:
A simple, open, server-to-server web-hook-based pubsub (publish/subscribe) protocol as an extension to Atom.
Parties (servers) speaking the PubSubHubbub protocol can get near-instant notifications (via webhook callbacks) when a topic (Atom URL) they’re interested in is updated.
The protocol in a nutshell is as follows:
* An Atom URL (a “topic”) declares its Hub server(s) in its Atom XML file, via . The hub(s) can be run by the publisher of the Atom, or can be a community hub that anybody can use. (RssFeeds are also supported!)
* A subscriber (a server that’s interested in a topic), initially fetches the Atom URL as normal. If the Atom file declares its hubs, the subscriber can then avoid lame, repeated polling of the URL and can instead register with the feed’s hub(s) and subscribe to updates.
* The subscriber subscribes to the Topic URL from the Topic URL’s declared Hub(s).
* When the Publisher next updates the Topic URL, the publisher software pings the Hub(s) saying that there’s an update.
* The hub efficiently fetches the published feed and multicasts the new/changed content out to all registered subscribers.
The protocol is decentralized and free. No company is at the center of this controlling it. Anybody can run a hub, or anybody can ping (publish) or subscribe using open hubs.
Note, while Atom is prominantly mentioned - the protocol supports RSS also (be kind of stupid if it didn’t!). Atom however is a basic unit in its operation, just like it’s an excellent basic unit to utilise in any web service dealing with collections of items defined in an XML syntax.
The reason a PubSubHubbub proposal is interesting to me (besides always being game for a challenge) is that like OpenID and OAuth, it’s another decentralised open protocol that operates over HTTP. Also, the basic units are already or will soon be implemented/released! Zend_Feed_Reader (ZF 1.9), Zend_Oauth (ZF 1.10), Zend_Crypt (ZF 1.10) and Zend_Feed_Writer (ZF 1.10 with a little luck). Putting the protocol on top of those ready to go components will save a lot of time and effort at the end of the day.
To be honest though, I have a few doubts on this one because PubSubHubbub is so new that it is only starting to seep into implementations in the wild. So getting it into the Zend Framework right now might not happen - as an early first spec its implementation will be continually evolving/growing over many months). I’ll what a review brings from the community once the proposal is written up this week.
That said, a week ago the spanking brand new PubSubHubbub Core 0.1 Specification (July 8, 2009) was implemented in at least one meaningful way - initial support has been implemented in FeedBurner. Then we have a WordPress plugin in progress, and several reference implementations including the for Google App engine. Still early days though. Of course, PubSubHubbub was created by Google engineers (Google run FeedBurner too) but it’s really a brilliant protocol, in my opinion, compared to something using Jabber/XMPP or worse which is overly complex (with a few exceptions) for this use case (HTTP+REST FTW!). I can see this easily taking off in a big way in the future once a number of full stream uses exist - maybe Google Reader will come next and that’s hugely popular.
Zend_Feed_Reader promoted to Zend Framework trunk (watch out for ZF 1.9!)
Jul 15th
I’m happy to say that Zend_Feed_Reader has been made it through its two week cleanup effort and emerged from Matthew’s review to join the Zend Framework trunk. Once the Zend Framework 1.9 release process spins up I look forward to more feedback how this component has turned out. Thanks to Jurriën Stutterheim (my co-conspirator), Matthew Weier O’Phinney for shepherding this through, and a special mention goes to Kawsar Saiyeed whose feedback while using Zend_Feed_Reader to build a feed aggregator over the past week was invaluable.
Zend_Feed_Reader grew out of my need to have something that is not just capable of reading feeds, but was capable of understanding and interpreting them. If you’ve used Zend_Feed, you know that getting something simple like content, or a creation date, is a task that requires a bit of work. Feeds come in three distinctly different forms: RSS, RDF/RSS and Atom, all with multiple versions. Each has it’s own way of presenting information. Each can also utilise extensions like RSS’s popular Dublin Core 1.1 module or Atom’s Threaded Extensions RFC. Getting a simple point of data can mean sifting through feeds to see what type and version they are, what elements to look for, what alternatives exist, and what alternatives should be prioritised over others. It’s work that has led developers to write long classes designed to handle the task. Zend_Feed is also not without its flaws. Its API is inconsistent, its namespace handling questionable, and extending it is not as easy as it looks.
So, is Zend_Feed_Reader any different? Well, I hope so . But you’ll have to dig a little deeper to see it. Here’s a quick example from the documentation showing off the API briefly.
[geshi lang=php]$feed = Zend_Feed_Reader::import(‘http://www.planet-php.net/rdf/’);
$data = array(
‘title’ => $feed->getTitle(),
‘link’ => $feed->getLink(),
‘dateModified’ => $feed->getDateModified(),
‘description’ => $feed->getDescription(),
‘language’ => $feed->getLanguage(),
‘entries’ => array(),
);
foreach ($feed as $entry) {
$edata = array(
‘title’ => $entry->getTitle(),
‘description’ => $entry->getDescription(),
‘dateModified’ => $entry->getDateModified(),
‘author’ => $entry->getAuthor(),
‘link’ => $entry->getLink(),
‘content’ => $entry->getContent()
);
$data['entries'][] = $edata;
}[/geshi]
Here’s a similar example using Zend_Feed (based on its manual Introduction).
[geshi lang=php]Zend_Feed::import(‘http://rss.slashdot.org/Slashdot/slashdot’);
$channel = array(
‘title’ => $slashdotRss->title(),
‘link’ => $slashdotRss->link(),
‘description’ => $slashdotRss->description(),
‘items’ => array()
);
foreach ($slashdotRss as $item) {
$channel['items'][] = array(
‘title’ => $item->title(),
‘link’ => $item->link(),
‘description’ => $item->description()
);
}[/geshi]
Wow, I hear you say, that is so different! Bloody marvellous! Stow the sarcasm as I explain a bit further though .
Zend_Feed_Reader and Zend_Feed are siblings - there is little doubt there. But what the example doesn’t show is what happens behind the scenes. Zend_Feed’s API allows you to run literal queries against the underlying XML. Essentially, the API is tied to the structure of the feed’s XML document making Zend_Feed a class with a mutable API. A description() method looks for description elements, in other words. The API is tied to element names, not the concept of what you’re trying to extract. This is great when RSS and Atom are kind enough to agree on a description element in their respective namespaces, but it goes off course when they don’t. For example, RSS 2.0 has the pubDate element whereas Atom 1.0 has the created and modified elements. What does this mean with Zend_Feed?
[geshi lang=php]$feed = Zend_Feed::import(‘http://www.example.com/feed/’);
$entry = $feed->current();
$dateModified = $entry->pubDate();
if (!$dateModified) {
$dateModified = $entry->published();
}
if (!$dateModified) {
$dateModified = $entry->modified();
}
if (!$dateModified) {
$dateModified = $entry->updated();
}
if (!$dateModified) {
$dateModified = $entry->created();
}
// Oh, crap. I forgot about Dublin Core… (there’s DC 1.0 and DC 1.1)
// Load up Zend_Date and some detection so we can normalise these dates after[/geshi]
With Zend_Feed, we need to run a few (7 or so) literal requests - we could add in Feed type detection but that would mean we’d be moving towards a Strategy Pattern (watch the code grow then, and the unit tests). Finally, all feed types vary in how they present dates. These would have to be normalised from RFC822, RFC2822, ISO 8601 or that W3C standard. Assuming the feeds used one of these. Now rinse and repeat for anything you want from that feed…and add unit tests to ensure it all works for various feed types and versions. Ouch! You will also need to learn RSS and Atom in detail along with their common use extensions like Dublin Core, Atom Threaded, yada, yada, yada…
Zend_Feed_Reader, on the otherhand, is not a gateway to the feed’s DOM. It just offers a simple lean API which is identical for every type and version of feed.
[geshi lang=php]$feed = Zend_Feed_Reader::import(‘http://www.example.com/feed/’);
$entry = $feed->current();
$dateModified = $entry->getDateModified();[/geshi]
In this case, the $dateModified result is a Zend_Date object. You can format the date any way you want from here. Behind the scenes, getDateModified() does a ton of work (all unit tested) to decipher the current feed and locate the data you seek from all the possible alternatives that might exist. Once it’s located, it’s imported into Zend_Date using the correct standard.
Of course, this all means you’ll have to endure a tiny easy to memorise API since you can only call getDateModified() - not pubDate(), created(), etc. Sorry for that. You’ll also have to put up with the extra 660+ unit tests Zend_Feed_Reader has added to the Zend Framework suite to cover every feed type and version, every common combination of versions and XML extensions, and all the niggly things like normalisation and consistent API returns. My apologies. We even fantasised that RSS might implement the entire Atom 1.0 spec as an RSS module. Insanity.
The result is that we have two sets of methods. One operates at the feed level, the other at the entry level. Once you boil down all the possible alternative elements, that means we implement 12 methods at the feed level, and 13 methods at the entry level (seriously, that is all you need in most cases!). We also provide methods for accessing the current object’s DOMElement, DOMDocument and DOMXPath objects for those times you need something not provided by the current API, or you can use the Extension system (see below).
As for comparing the internal workings, Zend_Feed_Reader uses only one method for querying feeds. XPath. XPath might seem a surprising choice but, being a lazy programmer, it was the option that required the least code, made it easy to see what methods did internally since XPath queries (if you speak the query language) are easy to follow, and assisted in enforcing a uniform approach to parsing feeds. This means Zend_Feed_Reader may be (and probably is) slower than Zend_Feed. Remember we also sift across alternatives to get what you’re looking for - so it could be many XPath queries per API call. We’ve added internal class caches so repetitive queries can skip XPath, and you can set your own persistent cache, to improve performance overall by minimising network requests. Zend_Feed_Reader also supports HTTP 1.1 Conditional GET requests to save on bandwidth, processing of unchanged feeds and network use.
Internally, Zend_Feed_Reader also implements (it’ll be much improved later after some refactoring) a plugin system where you can write custom “Extensions” that add methods to query feeds or entries for RSS/Atom module data not already bundled with Zend_Feed_Reader. This will cover cases where a custom RSS or Atom module (for example, weather, podcasting or geolocation data) is included in a feed and you want access to that data easily. I hope to bundle some of the more popular standards with Zend_Feed_Reader - but I’ll see what time I have left before ZF 1.9.
In the meantime - enjoy the new component. It’s led a quiet existence to date so I’m eager to get more feedback on usage. Kawsar has allegedly sacrificed fingers to complete massive emails containing feedback. Personally, the most invaluable QA tool available at this point in time are warm bodies sitting at a desk, using this component, and complaining to me when it doesn’t do what they expected!
Now, I only covered some basics - for a full overview of what Zend_Feed_Reader does and is capable of you should read the documentation. It’s currently not in the HTML online manual, but the Docbook XML source is still quite readable:
http://framework.zend.com/svn/framework/standard/trunk/documentation/manual/en/module_specs/Zend_Feed_Reader.xml