June 06, 2008
Search Monkey Data Services
by Marco Vitanza

In a previous post, I talked about my initial experience with Yahoo SearchMonkey. In this second post in the series, I will discuss how I set up the data service for my Blogspot app. Let's get to it.

The first thing you need for any SearchMonkey app is some data. In my case, I neeeded to pull the RSS feed for a Blogspot blog. SearchMonkey lets you create custom data services for your apps that can either query a webservice (such as the Digg API) or extract data from a plain ol' webpage.

Another key point is that apps must select a URL trigger pattern to activate on. Currently, Yahoo supports globbing for matching URLs. I chose *.blogspot.com/* so that my app would be activated on any search result associated with a Blogspot blog. The URL that triggers an app is then passed to each of the app's data services.

Since my data service needs to grab a blog's RSS feed, querying a web service is the way to go. But suppose the URL that gets passed to the data service is something like this:

http://myblog.blogspot.com/2008/06/mypost.html
I need to take that and query the RSS feed at:
http://myblog.blogspot.com/feeds/posts/default
Currently, SearchMonkey only allows you to insert fields from the Yahoo index into your web service query. The most complicated query URL you can build is something like:
http://webservice.com/givemedata?url={TRIGGER_URL}
There is no way to manipulate strings or variables. What to do?

The answer came when I remembered a brief conversation that I had with top SearchMonkey engineer Paul Tarjan after a SearchMonkey presentation at UCLA. In two words: Yahoo Pipes. I simply created this Yahoo Pipe which takes a Blogspot URL as an input, does some fancy string manipulation, queries the feed, and outputs the XML. After that, all that was necessary was to paste the pipe's URL into my data service and append the Blogspot trigger URL to the end of the pipe's query string. The actual query looks like this:

http://pipes.yahoo.com/pipes/pipe.run?_id=ukt_3V4p3RGkMZRFxAnzeQ
&_render=rss&url={dc:identifier}

To summarize: my data service queries the Pipe, which in turn queries the appropriate RSS feed and returns a chunk of XML. Now I can start building an actual app with this data, right? Wrong. Data services must translate their results (using XSLT) into dataRSS, Yahoo's custom format for describing data.

The dataRSS specification is relatively brief and contains only a few tags, the most important of which are: <item> and <meta>. These tags can be annotated with standard microformats and Dublin Core metadata. Here is the relevant section of my XSL code:

<xsl:for-each select="//item">
<item rel="rel:Posting">
    <meta property="dc:title"><xsl:value-of select="title"/></meta>
    <meta property="dc:identifier"><xsl:value-of select="link"/></meta>
    <meta property="dc:creator"><xsl:value-of select="author"/></meta>
    <meta property="pubDate"><xsl:value-of select="pubDate"/></meta>
</item>
</xsl:for-each>

For those of you who are unfamiliar with XSL, this code simply looks for any <item> tags in the RSS, then extracts the text from the <title>, <link>, <author>, and <pubDate> tags that are inside the <item>. The <item> with the rel attribute and the <meta>s are dataRSS (the output of the data service).

And with that, we have a working data service, ready to be used by any SearchMonkey app. In my next post I will discuss how to incorporate this data into an app, and how SearchMonkey's presentation layer works.

Post Comment

Name:
Comment:  
Security Code: Verification Code Five Digits