Posted by & filed under News.

After chatting in emails with Dave Winer, he suggested that I write down some thoughts about how one might integrate rssCloud notification services with Google’s PubSubHubBub solution. I took a first shot at getting the basic issues down in a Google Doc that is shared at http://tinyurl.com/rsscloudPSHB

It’s certainly NOT a specific proposal on how to implement this. Rather, it should give everyone some common terminology and concepts to discuss how this might actually be achieved.

I think the two solutions are close enough that full interoperability only requires a tiny amount of work. Check out the document and feel free to leave comments here.

  • Pingback: World’s Largest Paid Blogging Platform Goes Real-Time

  • Pingback: World’s Largest Paid Blogging Platform Goes Real-Time | Newsfed - Aggregate local and tech stories with related videos and tweets!

  • Pingback: World's Largest Paid Blogging Platform Goes Real-Time - RoboXpress

  • Pingback: World’s Largest Paid Blogging Platform Goes Real-Time | Spin Valley Post

  • http://grack.com Matt Mastracci

    I disagree that about the driving force for the standard going forward. Since rssCloud has some pretty glaring holes in its API that prevent it from being used in virtual hosted and other virtualize datacenters, it just won’t be adopted by feed readers.

    Nick Lothian pointed out an interesting point today. Not only does the IP address endpoint requirement break virtualized datacenters like AppEngine that router in/out HTTP traffic differently, it also *entirely* breaks for vhosted HTTP sites. Many sites are not accessible over HTTP/1.0 and require a specific Host: header. rssCloud makes no accommodation for this, locking those sites entirely out of the ecosystem.

    Since there’s no way to guarantee that every feed reader will support rssCloud, it makes more sense as a hub admin or a publisher to start with PSHB which at least has a chance of being implemented on every host.

  • http://www.shotton.com/ cshotton

    I think the protocol issues are important to address, perhaps as part of the “normalization” effort. Today’s TechCrunch “review”, while obviously biased, does lay out a couple of issues that could quickly be addressed to overcome any perceived difference. http://www.techcrunch.com/2009/09/09/rsscloud-vs-pubsubhubbub-why-the-fat-pings-win/

    I think it’s likely that the ultimate measure of these two solutions is going to come from the breadth of support at the hub level. This is the piece of technology that neither publishers nor subscribers are going to create themselves, at least not in any production-quality form that scales. So it’s important to think about how implementors of hubs are going to view the APIs and the hub-specific implementation requirements.

  • http://grack.com Matt Mastracci

    One additional point that I didn’t see reflected in the document:

    rssCloud needs some API work to bring it up to par with PSHB’s subscription API. It’s missing any sort of validation of subscription requests: any URL that returns 200 OK is considered successful. This lets you do fun stuff like subscribing an rssCloud hub to another rssCloud hub’s subscription address (repeat ad nauseam). Dave Winer’s own rssCloud endpoint failed to validate that the rssCloud parameters were provided in the body of the POST, allowing someone to subscribe another hub to it as an endpoint.

    Pubsubhubbub uses a verification key to ensure that the endpoint is actually the one that requested the subscription. Not having this sort of verification in rssCloud is opening up the system for an exploit, one of which I submitted to the author of WP rssCloud already (fixed in 0.3)!

    Additionally, rssCloud requires that your endpoint match your IP address which is not always possible in many hosted web environments. This limitation also prevents you from subscribing other parties on your behalf, which is useful in the desktop feed aggregator case. I’ve already used this to build a PSHB->XMPP gateway (available here http://pubsubhubbub-xmpp.appspot.com/) that would enable it on the desktop.

  • http://grack.com Matt Mastracci

    Awesome, this is a useful resource to have available.

    Note that PSHB isn’t concerned with “new” data, but rather changes in the document overall. If a post is modified, the updated content is included in the notification ping. This means that it can be used to deliver updates from a liveblog post to subscribers in real-time, for instance.

    Pubsubhubbub notifications to the hub, as well as from the hub to the subscriber would happen under an identical set of circumstances as with rssCloud. It’s not even a superset or subset of rssCloud’s notifications: they will always happen in tandem.

    The big advantage for PSHB is that it does the grunt work of detecting any feed change so that every subscriber isn’t required to fetch the full feed and perform the operation that the hub would be performing on its behalf. If there’s a use case that isn’t covered by PSHB, there’s nothing stopping the subscriber from hitting the feed specified in the notification update and performing its own checks every time (effectively falling back to the same semantics as rssCloud).

    From what I can tell, PSHB is somewhat influenced by Atom’s structure, in that it assumes there are a stable set of IDs for it to base the feed diff on. I’ve peered into the source and they have a significant number of testcases to deal with RSS, which is notoriously unreliable when it comes to permanent identifiers. The PSHB protocol works fine using any type of RSS, however – it just delivers a subset of the RSS document, in RSS format during every change.

    FWIW, the notifications in PSHB are basically just HTTP POSTed versions of the feeds that it subscribes to, with all the unchanged feed items stripped out. There’s no magic format here: unchanged content is simply removed.

  • http://www.shotton.com/ cshotton

    Good points. I’ll fix the doc. The “update-only” mode would be a good addition for PSHB. It really does restrict some of the options otherwise available to subscribers by taking on the job of attempting to deliver only “new” content within the hub.

    I’m curious how much of that behavior is dependent on attributes within the Atom format (of which I am fairly ignorant).

  • http://grack.com Matt Mastracci

    I think there’s a mistake on page 6: “PSHB hub implementations must cache published content while rssCloud hubs explicitly do not.”.

    rssCloud hubs cache the digest of the *whole feed*, while PSHB hubs cache the digests of the individual RSS/atom entries. The document as it is infers that rssCloud doesn’t keep track of the feed that it is tracking.

    Here’s the relevant part of the PSHB spec: “The hub caches minimal metadata (id, data, entry digest) about each topic’s previous state. When the hub re-fetches a topic feed (on its own initiative or as a result of a publisher’s ping) and finds a delta, it enqueues a notification to all registered subscribers.”

    It seems like Pubsubhubbub could add a note that hubs may send “update-only” notifications and take on the only real functional difference between PSHB and rssCloud. With that change, you could port over the current WP rssCloud plugin by replacing the HTTP POST parameters and checking the result of subscription requests.