Posted by & filed under Tech.

I’ve been working with Dave Winer’s “feedhose” system for the past several days and have run into a problem that will likely face everyone trying to leverage this technology.

Since the RSS spec is fairly lenient, things like title and description tags can have all sorts of mark-up in them. (Don’t get me started on the issue of mark-up in RSS titles — it’s a major pet peeve of mine!) Anyway, it’s obviously possible to stick JSON syntax into the content of a RSS feed.

This makes interpreting the JSON data coming from Dave’s RSS-to-JSON conversion process a bit complicated. So here is the question(s) at hand:

When is the proper time to escape/encode JSON syntax that may be present in RSS, what specific tokens need to be escaped, and is it ever OK to send the feedhose JSON straight to Javascript’s eval() function?

The answers will help the feedhose stack properly and securely pass RSS up to clients for display, so please give freely of your painful past experience.

  • Anonymous

    Chuck, this Google search turns up some interesting stuff.

  • Anonymous

    Let me clarify. Yes, I think mark-up in title entities is a bad thing. The problem (as evidenced by this post) is that now mark-up is not just limited to HTML which is what we normally see in this (and description) tag.Now we have bad things that can happen if JSON is embedded in these fields because an automated conversion of RSS to JSON might allow someone to do something bad with injected JSON.I don’t want to be in the business of parsing on the client side for HTML tags, JSON syntax, and anything else that might work its way in there. (doesn’t mean I don’t have to be, just that I don’t want to be).The gist of my question is trying to identify the earliest point in the pipeline where it is most appropriate and easiest to catch the JSON. This is a very narrow issue and only is an issue because we are talking about taking RSS and converting it to JSON, which means that a JSON injection attack is now possible. And what do we have to encode in that embedded JSON to prevent such an attack?

  • Anonymous

    Chuck, you can’t raise your peeve like that without explaining?Would you rather titles didn’t have markup? My suggestion — just strip the markup.I’m even more radical in River2. I also strip the markup from descriptions and truncate it to 500 characters. I got tired of longwinded bloggers and people who put ads in their feeds. I just strip it all out. Goodbye! 🙂

    I also linked to this post from my Twitter feed. If that doesn’t get us some good info, I’ll write a blog post on