The rich in RSS | Nick Freear’s blog

Last month my colleagues and I had a team “hackday” – an opportunity to work together (we often work individually on projects) and rapidly develop some software prototypes. We had a few ideas beforehand, did a brainstorm then got down to business in the Digilab. It was a general success – however, we aren’t ready to show the results just yet – I’ll update when we do ;). Richard, Juliette, Patrick and Will worked with Twitter. Sam and I put together an event feed aggregator, using Yahoo Pipes.

We used RememberTheMilk and Google Calendar feeds as examples, and I was struck again - why don’t people use existing standards? Specifically, why don’t the feeds provided by RTM and Google use the RSS 1.0 Event module? What they do is markup (or not) the data for the event (start date, location etc.) in HTML, in RSS or Atom. So, for RememberTheMilk we have,

<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt">&lt;entry&gt;</span>
  ...
  <span class="nt">&lt;id&gt;</span>tag:rememberthemilk.com,1999:tasks-nfre ...<span class="nt">&lt;/id&gt;</span>
  <span class="nt">&lt;content</span> <span class="na">type=</span><span class="s">"xhtml"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">xmlns=</span><span class="s">""</span><span class="nt">&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"rtm_due"</span><span class="nt">&gt;&lt;span</span> <span class="na">class=</span><span class="s">"rtm_due_title"</span><span class="nt">&gt;</span>Due: <span class="nt">&lt;/span&gt;</span>
        <span class="nt">&lt;span</span> <span class="na">class=</span><span class="s">"rtm_due_value"</span><span class="nt">&gt;</span>Thu 10 Apr 08 at 10:00AM<span class="nt">&lt;/span&gt;&lt;/div&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"rtm_priority"</span><span class="nt">&gt;&lt;span</span> <span class="na">class=</span><span class="s">"rtm_priority_title"</span><span class="nt">&gt;</span>Priority: <span class="nt">&lt;/span&gt;</span>
        <span class="nt">&lt;span</span> <span class="na">class=</span><span class="s">"rtm_due_value"</span><span class="nt">&gt;</span>none<span class="nt">&lt;/span&gt;&lt;/div&gt;</span>
    ...</code></pre></figure>

And for Google Calendar,

<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt">&lt;entry</span> <span class="na">xmlns=</span><span class="s">"http://www.w3.org/2005/Atom"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;id&gt;</span>http://www.google.com/calendar/feeds/d4 ... <span class="nt">&lt;/id&gt;</span>
  <span class="nt">&lt;published&gt;</span>2009-02-18T18:08:04.000Z<span class="nt">&lt;/published&gt;</span>
  <span class="nt">&lt;category</span> <span class="na">scheme=</span><span class="s">"http://schemas.google.com/g/2005#kind"</span> <span class="na">term=</span><span class="s">"http://schemas.google.com/g/2005#event"</span><span class="nt">/&gt;</span>
  <span class="nt">&lt;title</span> <span class="na">type=</span><span class="s">"html"</span><span class="nt">&gt;</span>Quick ...<span class="nt">&lt;/title&gt;</span>
  <span class="nt">&lt;summary</span> <span class="na">type=</span><span class="s">"html"</span><span class="nt">&gt;</span>
   When: Wed 18 Feb 2009 18:00 to 18:15 GMT<span class="nt">&lt;br&gt;</span>
   <span class="nt">&lt;br&gt;</span>Event Status: confirmed
  <span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;author&gt;&lt;name&gt;</span>Sam ...<span class="nt">&lt;/name&gt;&lt;/author&gt;</span>
...</code></pre></figure>

Google Calendar

Now, the examples above are useful for consumption by humans in a feed reader. However, they are a pain to machine-parse. The HTML ‘divs’ in RTM are easier, but you have to do something special for each calendar provider (regular expressions for Google, yuck!)

The RSS 1.0 Event module was published in 2001. It defines the elements startdate, enddate (W3CDTF), location, organizer (person or body) and type (fixed taxonomy ??). So the Google Calendar entry above becomes something like,

<entry xmlns="..." xmlns:ev="http://purl.org/rss/1.0/modules/event/">
  <id>http://www.google.com/calendar/feeds/d4 ... </id>
  <published>2009-02-18T18:08:04.000Z
  <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/g/2005#event"/>
  <title type="html">Quick ...</title>
  <ev:startdate>2009-02-18T18:00+00:00</ev:startdate>
  <ev:enddate>2009-02-18T18:15+00:00</ev:enddate>
  <ev:location> ... </ev:location>
  <ev:organizer>Sam ...</ev:organizer>
  <summary type="html">
   When: Wed 18 Feb 2009 18:00 to 18:15 GMT <br>
   <br>Event Status: confirmed
  </summary>
  <author><name>Sam ...</name></author>
...

So now we have data that is easily accessible to humans (via generic feed readers), and to machines (specialist event parsers) - simple? (The code samples above are cut-down for illustration purposes.) [26 March, 3 April]