More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  Design by CommitteePhotosProfileFriendsMore Tools Explore the Spaces community

Design by Committee

Jonathan Marsh keeps an eye out for the sublime and the absurd
May 09

Mashup Server Webinar May 13th

I've got another free Webinar coming up - again an Introduction to the WSO2 Mashup Server and to mooshup.com on 13 May from 9-10AM PST.  Join me if you:

  • Are curious about mashups, Mashup Server products, and want an overview of the capabilities of WSO2's offering.
  • Have services, web pages, or other information sources available but you want smarter ways to use those services.
  • Always wanted an application to do (x) on the Web but it was always too costly to develop.
  • Know Javascript and want to see what it can do outside the browser.

    Register now!

  • May 05

    Am I the last to know we're cool?!

    Seems the WSO2 crew has been blogging about WSO2 appearing on the "cool 5" companies in a recent Gartner report (paid subscribers only).  What intrigued them about the WSO2 Mashup Server was support for the hitherto paradoxical "lightweight but enterprise-oriented" services.

    And here I am a couple of days late.  I guess for breaking news and the real skinny on "cool" you would do well to add the feeds of Paul, Sanjiva, Daniel, Glen, and Keith to your blogroll.

    April 21

    I'm on YouTube...

    Just noticed this interview I gave at Mashup Camp posted on YouTube.  (Does everyone hate hearing themselves talk as much as I do?)

      
    April 08

    Webinar - A New Approach to Web Service Composition

    I'm giving a free Webinar on 15 April 2008 9-10AM PDT about the approach we took to service composition in the WSO2 SOA Platform.  Instead of a declarative approach which my XSLT days showed can be powerful yet also limiting in many ways compared with a full programming language, the WSO2 Mashup Server uses a "scriptable Web Services" metaphor, and supports the ability to consume and produce Web Services using simple Javascript expressions.  Add to that the ability to script non-Web Service materials like Web pages and feeds, and you've got a powerful yet accessible platform for creating Web Service mashups.  Register now!

    March 31

    Sri Lankan Incident Mashup

    I just posted a finished version of a Mashup designed to help answer the question "is Sri Lanka getting safer or not?"  This is a question we on the global WSO2 team ask each time we arrange travel to that unfortunately troubled country.  Despite a spate of violence early this year, designed to coincide with the formal dissolution of the cease fire that has done little to prevent violence, it seemed to me things were getting a little better.  But I needed some facts to back that up.

    image The mashup plots bombings and other incidents as a bar chart, measuring the severity of the incident (how many killed & wounded) over the last 18 months.  The idea was to see if there was a clear overall trend in the violence or not, something not readily apparent from a Google map (see right).

    The mashup service itself consists of several items, each one a simple task accomplished in half a page of javascript:

    • Scrape the search results page at globalincidentmap.com to pull the essential data out of an HTML table for the country of interest, and put it into a simple XML structure.
    • Cache the page if it has already been accessed within 24 hours (scraping is expensive, the first access in any 24 hour period will be pretty slow.)
    • Parse the headline for patterns such as "eight killed" or "injures 7" and turn that into killed and wounded digits (this isn't perfect, but we can tolerate a few errors since we're trying to present an overall sense of the problem rather than perfectly accurate statistics.)  Also filter out as much as possible the killing of LTTE, as that's more a measure of war than of terrorism.
    • Provide a helper method to look up details and get a link to a new story for any item.

    Using this service (called internationalindicent) I created a custom HTML UI to format Sri Lanka-specific results into the bar chart and to make it interactive (click on a bar to see more info about the incident.)  Then I used "share this mashup" to upload the service to mooshup.com so others could try it out (or copy the code.)

    The rough version took a couple of hours, mostly figuring out the details of scraping the page and coming up with the headline patterns to look for, but then I spent a couple more polishing the HTML UI so I wouldn't be embarrassed to share it.  In the process I demonstrated some of the powerful aspects of using mashup technologies in your development arsenal:

    • Instead of investigating the current situation using Google News for 30-60 minutes each time we plan a trip for the latest information, I can browse this site in a few minutes, see the trend, and get details of any recent incidents I'm interested in.  This will pay for itself in terms of my own personal productivity before long.
    • There is also a small user group (really, just the handful of WSO2 employees based outside Sri Lanka) who can also benefit from this micro-application, increasing their productivity as well.
    • The WSO2 Mashup Server makes the data available as a service, so others can reuse it too, for alternate displays or to generate displays for other parts of the world.
    • And, it's just fun!

    So ... is Sri Lanka getting safer?  I'll have to let you be the judge of that.  Go to http://mooshup.com/services/jonathan/internationalincident/ to see for yourself!

    March 25

    Mashup Camp 6

    20080319-_ND32514Just returned from some travel which included 2 days of Mashup University and a day of Mashup Camp.  A few thoughts:

    • Maps are still well represented.  This surprised me a bit, but I recognize I've been well immersed in the WSO2 Mashup Server which supports a wide variety of User Interactions (html pages including maps, but also feeds, email, instant messaging.)  I predict non-map mashups will start to eat into the map-based mashup market share dominance over the next year.
    • There's lots of interest in consuming various Web APIs.  Some vendors were promoting their APIs, others tools for consuming those APIs.  I think the WSO2 Mashup Server can tap into an underserved market here, since it's the easiest way I know of to deliver a comprehensive Web API on top of a bit of Javascript logic, which in turn can front information sources as diverse as databases and scraped web pages.  I think the Mashup Server can become a "design your own API" tool that can have great appeal to the mashup developer.
    • A lot of Javascript was in evidence, further validating our choice to use Javascript for mashup logic in the WSO2 Mashup Server.
    • For the first time I ran into a few people with serious interest in WADL.  Has its time finally come?  More on that in a subsequent post.
    • The "unconference" style was quite interesting and successful, especially if you're like me and are more interesting in connecting with interesting individuals than in hearing yet another vendor pitch (mine excepted of course!) ;-)  One thing that is sorely lacking is any kind of organization to the conference Wiki, and a surprising lack of ability to record much of the activity there.  I couldn't even find who won the Best Mashup contest...

    I can't wait till next year!

    February 22

    Paul asks "Mashup or Integration?"

    Paul Downey poses some interesting considerations for what a mashup consists of.  I think I'd list a pretty different set of criteria, but for now I'll just start by comparing a mashup hosted by the WSO2 Mashup Server against his list to see how we stack up:

    • Zero touch: +1.  Each mashup is accessible, documented, try-able, and so forth right out of the box.  The admin console makes them discoverable, and if you have a user account (email-validated guests are supported too) you can tag, rate, comment a mashup to build a community around them to make any touch beyond "zero" positive and reciprocal.
    • Safe: +.5.  The Mashup Server supports and encourages safe interactions, but it does not prevent users from hanging themselves if they so choose.  That is, I can have an operation exposed through GET, explicitly marked as safe by the author even, that has seriously detrimental side effects.
    • Cool URIs: +1.  Each mashup has neat URIs for the endpoints, for the metadata and tooling associated with it, for the admin capabilities associated with it, and even for accessing the operations over HTTP.
    • Open data formats: +1.  We're partial to XML and use Web Services under the hood, but also provide bridges to HTML, JSON, Feeds, etc.
    • Eschew RIA: +1.  By default we don't provide any rich interface.
    • HTML form access: +1.  By default we do provide an HTML try-it form, and stubs to quickly build your own HTML pages.
    • Accessible URIs: +1.  All endpoints are available through both http and https by default.  Mashups can be easily migrated outside local contexts so their visibility is greater.  E.g. mooshup.com.
    • SOAP/WSDL: -1.  Everything we do has rich metadata and SOAP 1.1 and SOAP 1.2 bindings along with the REST/POX binding, and we also make it easy to consume web services in these formats.  However, you don't have to know anything about these formats to write a successful mashup - they're just valuable parts of the plumbing.  So maybe a -1 is too stingy.
    • Authentication: +0.  We support username/password and Infocard to access the Mashup admin site, but there isn't a drop-dead simple way to restrict access to the service based on these controls.  Not too hard to provision higher levels of security using WS-Security, but I'm not sure Paul would agree that's sufficient.
    • Scratch your itch: +1.  Subjective, but the whole product is designed to serve the needs of individual developers to hack up services as easy as they can hack up a simple web page.
    • Fun: +2!!  As long as you're willing to hack a bit of Javascript.

    So I think the Mashup Server stacks up pretty well using Paul's criteria, which I think can be summed up as "has a nice web interface" and "fun and easy".  But some of Paul's criteria don't fit with that summary and those are unsurprisingly the ones I take some issue with:

    RIA: I don't see any reason why a rich interface to a mashup is always bad.  Only when it locks up the data in a way that's impractical to reuse.  In our model, we support the separation of content and presentation in order to lift the limits on the kinds of presentation environments the user prefers.  A single mashup can (and ideally should) support as many presentation media as are appropriate, including simple web pages, RIAs, feeds, notifications and messages, portlets, widgets, whatever.  If an RIA "scratches my itch" then what's wrong with it?

    For example, my iPod Touch comes with a YouTube app, which is an RIA for the YouTube site optimized to the screen limitations of the product.  I can also point Safari right at YouTube, but the optimized version actually is easier for simple access.  Is this bad?  It really depends on what the consumer wants.  I agree dead ending in an RIA may be inappropriate for some users of that service, but providing an RIA front end to a clean and well-documented interface that can be repurposed in other ways seems like a good user-centered feature that even Paul would support. I think he probably meant this item to mean "Does the site rely solely on so-called 'rich user experience' technologies in a way that precludes data from being accessed though a nice web interface?"

    SOAP/WSDL: I understand why Paul would add this to the list.  Up until the Mashup Server Web Services were just too hard to consume to allow them to stand on a list with "scratch your itch" and "fun".  However I think we've turned the corner on that.  A Mashup Server author rarely has to even consider whether SOAP is involved in delivering a Web Service - we can take care of that for them.  Right now it's simply an alternative to the REST interface.

    As for WSDL, in the Mashup Server it strongly supports the other goals of zero touch, cool URIs, and open data formats.  I'm pretty tired of doing one-off hand coding to access REST sites or sucking up their hand-crafted and varying-quality stubs.  WSDL has a big role to play in simplifying these interactions for the developer, in a way that I think Paul would appreciate.  It's a primary artifact in providing the "nice web interface" that we all agree is invaluable.

    Anyway, thanks Paul for a provocative post!

    February 14

    Music at Pachamama's, take 2

    Bro Jason and I again will improvise around a few sets at PachaMama's Organic Cafe (map) this Friday, February 15 7:00 – 9:00 PM.  Drop in for a listen!

    February 07

    Upcoming Webinar, conference

    I'm giving the first of a regular series of free Webinars to introduce people to the WSO2 Mashup Server and to mooshup.com on 12 Feb from 9-10AM PST.   Register now!

    I'll also be talking about mashups at next Monday's Web Services on Wall Street conference, starting with the opening panel: "Enterprise Mashups For Wall Street – Leveraging SOA and Web 2.0".  If you're there, come say hi!

    January 28

    WSO2 Mashup Server 1.0

    At last - we've shipped the 1.0 release of the WSO2 Mashup Server!  This project has been in the making since I joined WSO2 over a year ago.  A lot of hard work goes into a project like this, but it's amazing to me how much we've accomplished in so little time with so modest resources.

    I'll be talking more about the Mashup Server from here on out - how it provides a powerful platform for consuming and exposing Web information of many formats, but focused on Web services (REST and WS-*).  But for now you can get a flavor of it from the press release.

    mooshup_logo_175 At the same time we've launched an on-line site for hosting and sharing mashups, at mooshup.com.  I think it will be a pretty fun site, and shows some of the cool community features we've built in from the ground up.  Also you might want to subscribe to the Mooshup.com blog, where we'll have some great discussions about how to use the Mashup Server and mooshup.com, and what's next for the product (we'll need your input here!)

    Kudos to the team, esp. Keith, Channa, Tyrell, Thilina, and Yumani, with the help of many others.

    But now, I think I've earned a little nap ;-).

    December 27

    Christmas in Singapore

    See the photos here.

    We had a great time exploring Singapore over the last few days.  Here are just a few of the highlights:

    • Raffle's Landing spotArriving at the hotel well after midnight and reuniting with our German friends as we were checking in.
    • The Shangri-La hotel breakfast buffet.  Each morning we power up for the day with a buffet the includes a build-your-own noodle soup section, Indian fare, dim sum, Japanese and Korean fare, fruits and pastries in abundance in addition to the full western buffet.  Three plates each usually fattens us up enough to skip lunch, not to mention discourage an early dinner, but not enough to taste everything that looks delicious.
    • Double-decker hop-on-hop-off bus tour of central Singapore, introducing us to the city, a mix of colonial buildings, traditional two-story storefronts with upper-story shutters in a rainbow of colors, and glass and steel skyscrapers.  Laced through with impeccably maintained greenery.  I didn't expect Singapore to feel so spacious and gracious.  It doesn't have that intensely urban feel like Hong Kong or New York.
    • ShuttersWandering through Little India, a maze of shops and eateries heady with burning incense.
    • Dinner at the home of a colleague of our friends, getting a picture of home life and how east meets west.
    • Trawling the many malls on Orchard Road.  A little of that goes a long way for me, unless followed by...
    • A daily afternoon dip in the pool, before, during or after the brief but sometimes heavy daily rainstorm.  An hour with a good book under a beach umbrella, listening to children in the pool or the patter of rain just beyond the shelter.
    • The Night Safari - a cross between a zoo and a wild animal park, but in a dark and rain-glistening tropical jungle lit by a full moon and artistically placed lighting not much brighter than the moon itself.  Never though have I seen such an active and alert a collection of animals - Malaysian tigers, jackals, tapirs, capybaras, elephants, barbarosas (a lumpy wild pig with upturned tusks sticking right through the roof of it's snout), giant anteaters, sloth bears, bat-eared foxes, giraffes, hippos, water buffalo, antelope...
    • Boat QuayA boat loop on the Singapore river, starting at Boat Quay, a quaint strip of eateries tucked up next to the polished spires of the financial district.
    • Chinatown - a section of quaint and colorful storefronts and street vendors - though very few hawking cheap Chinese merchandise.  For some reason Indian trinkets dominate.  The Hindu temple is festooned with a layer cake of blue characters and a menagerie of animals, and the courtyard walls are topped with images of lounging sacred white cows.
    • Christmas serenadeMeeting with our friends in the vaulted lobby for Christmas eve present sharing, at the foot of a 30 foot tree, as a live choir performs intricately harmonized Christmas songs nearby.
    • A new iPod touch ;-).
    • Indochine, a trendy southeast Asian eatery housed in a wing of the Singapore Asian Civilization Museum on the waterfront looking towards Boat Quay and the financial district, provided the perfect Christmas Eve dinner venue.  My top choices - a beef and prawn salad, steamed cod in lemon sauce, green mussels in coconut curry, lemongrass creme brule, mango and sticky rice.  Wow!
    • The Singapore Art Museum, housed in a colonial former school, boasts an interesting collection of Asian contemporary art, some of which seems rather primitive to me, others quite sophisticated.
    • Orchids 4Perfectly manicured botanical gardens brimming with more kinds of palm tree than you had ever imagined existed.  The fantastic orchid garden.  My favorite specimen is the spindly and bizarrely twisting brownish-purple "Margaret Thatcher".
    • Club Chinois on Orchard Road for Christmas dinner - upscale, trendy, light oriental.  Featured scallops on a slab of silken tofu, fois gras on crispy duck skin, chicken drumstick on a Chinese sweet radish salsa, minced 5-spice chicken on a disk of silken tofu, cod braised in a clay pot with baby bok choy, and cubes of tenderloin stir-fired with ginger and green onion in a crispy noodle basket.  Chased down with apple pie, chocolate lava, peanut-encrusted rice balls filled with bean paste with a honey-ginseng tea, and a warm creamy almond "soup" in a new coconut shell.  One of the best dinners ever!
    • Angels 2Strolling through Christmas street party on Orchard Drive, with elaborate lights and decorations, bizarre floats (my favorite - the jumbo sliced pannatone loaf with fern-like shoots springing from its top and dotted with bread loaves in case you didn't get the "Jesus is the bread of life" theme), a concert.  But mostly filled with shutterbugs milling around.  At any one time, 1/3 were taking photos, 1/3 were posing for photos, and 1/3 were waiting to take a photo or pose in one.  I'm not exaggerating!
    • Sentosa CableTaking a cable car to Mount Haber (only a 100 meters high or so), and back down to Sentosa Island, which is gradually turning from a beach resort and golf course into an island-sized theme park.  We caught a computer-rendered 3D chair-hurtling "log ride" down a mountain.  Then strolled through Undersea World which held a number of interesting specimens such as the tiny red hearted but otherwise translucent sea angels, giant Japanese Spider crabs, and a long underwater tunnel where we could watch scuba divers feeding the manta rays and hordes of other fish.  Even a dugong.
    • The Pink Dolphin show in the lagoon - standard fare with tricks and petting from volunteers.  With quite a jostling crowd and corny tourist patter, it was remarkable only in that we actually saw the pink dolphins.
    • Dinner at The Banana Leaf Apolo in Little India, where dinner is served on a banana leaf mat that acts both as a placemat and plate.  Gen was the only one who ate the whole meal of samosas, Tandoori chicken, chicken tikka masala, paneer in a creamy tumeric sauce, and a paneer/potato kofta in saag.  Chased down with mugs of limeade.
    November 09

    Central Park Studio website update

    Just launched last weekend's project - a restyling of Deanna's web site at http://www.central-park-studio.com.  The best new feature this round is the addition of a feed for her events and announcements, replacing a mishmash of events, news items, and pullouts that was hard to maintain.

    Pachamama's gig

    1591607-1134541-thumbnail.jpgWork keeps me too busy to play much piano any more - regular playing at church and an occasional background music at an art opening is about it.  But brother Jason and I are going to do a simple gig at the local organic cafe on Nov. 16th (7-9PM).

    Should be a fun and somewhat unusual mixture of new age, jazz, and ethnic and fiddle music.  If you're in the area, drop in!

    October 26

    Mashing up a National Geographic Photo of the Day Feed

    (This article first appeared a few days ago on the WSO2 Oxygen Tank.)

    I recently wrote a neat little mashup which demonstrates a little of the power of the WSO2 Mashup Server to flow information from one place to another, and from one format to another.  I had a simple set of requirements:

    1. I use the Google Photos Screensaver to show a slideshow of interesting photographs when the family-room computer isn’t being used.  Since the family room computer includes the 37-inch LCD screen as a separate monitor, high quality photos come out really clear and make for a nice, constantly changing design element for the room.  It works best if the set of photographs changes before it gets old.
    2. I recently found the National Geographic site’s “photo of the day” section as an interesting source of high-quality photographs that updates on a daily basis.  However, National Geographic doesn’t provide a feed for the photo of the day.

    Essentially then the task was to scrape the URLs from the photo of the day, and package them into a feed.  The complication comes from the fact that there doesn’t seem to be a list of photos of the day available on the National Geographic web site – just links from a particular photo to the one for the previous (or next) day.  Because a feed of 30 photos requires 30 different pages to be scraped, some caching really becomes necessary to improve performance, especially since feed readers can be expected to bombard the service if it proves popular.

    I initially broke down the task into three parts:

    1. Scraping a photo of the day page to extract the useful metadata, including the date, title, photographer’s credit, and description of the photo, a set of links to the actual image in various sizes, and links to the page being scraped (so one can return there easily) and to the previous page in the photo stream.  Since this metadata shouldn’t vary, cache it locally for faster retrieval.
    2. Searching the cache or going to the web site (and thus populate the cache) to acquire the metadata for a particular date.
    3. Formatting the metadata for a particular range of dates into a feed.

    Here’s how I approached each of these tasks.

     

    Scraping a photo of the day page

    The first order of business for scraping a page like this is simply to fetch the page, tidy it into XML so we can navigate it using tools like XPath.  The WSO2 Mashup Server provides a “Scraper” object that accepts an XML language describing the steps involved in scraping.  This configuration language is defined by the Web Harvest component that we use for scraping.  I usually start with a scraping mashup using a simple function that configures and performs the scrape, and returns the results:

     
    function scrape_picture_page() {
      var config =
        <config>
          <var-def name='response'>
            <html-to-xml>
              <http method='get'
    url="http://photography.nationalgeographic.com/photography/ ¬
    photo-of-the-day" />
            </html-to-xml>
          </var-def>
        </config>;
     
      var scraper = new Scraper(config);
     
      var bodyWithoutXMLDecl =
    scraper.response.substring(scraper.response.indexOf('?>') + 2);
      var result = new XML(bodyWithoutXMLDecl);
     
      return result;
    }

    The config language itself is pretty straightforward, once you learn to read it inside out – the <http> element fetches the requested URL, the <html-to-xml> does just what it sounds like and tidies the result, which is put into a variable named “response”.  The scrape is performed by initializing a new “Scraper” object with the config, and the result is made available through the “response” property on the result – corresponding to the “response” variable we defined within the config file.  One trick though – the result is a stream of XML text, including an XML declaration.  The E4X extensions can parse this into XML (new XML()), but can’t handle the XML declaration.  We have to strip off the declaration ourselves using string manipulation.

    By placing the above function in a file named “nationalgeographic.js” in the “scripts” directory of the Mashup Server, a Web service with a scrape_picture_page operation will be deployed.  We can get to it through the try-it page (http://localhost:7762/services/jonathan/nationalgeographic?tryit) and see what the tidied HTML looks like for the page.

    Extracting the data from the page can be a tedious process, involving looking at HTTP request-response pairs and trolling through the HTML source of a page.  Fortunately the National Geographic site’s HTML is simple and straightforwardly structured, with a number of well-placed identifiers to help us zero in on the interesting content.  I usually end up using Firebug (Firefox debugging extension) to navigate the live HTML of the page and develop some XPath expressions that extract the desired metadata for the page.  I’ve also found that, since Web Harvest communicates between components using strings rather than parsed XML, that defining a lot of XPath filters to extract information one element at a time during a scrape can perform poorly.  Instead it seems much faster to wrap a series of XPath expressions into a simple XSLT stylesheet so the XML can be parsed once, queried as much as needed, and an XML structure containing the results returned in one action.  To do that, I added an XSLT stylesheet to the above configuration:

      var config =
        <config>
          <var-def name='response'>
            <xslt>
              <xml>
                <html-to-xml>
                  <http method='get' url={url} />
                </html-to-xml>
              </xml>
              <stylesheet><![CDATA[
               <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                  <xsl:output method="xml" omit-xml-declaration="yes"/>
                  <xsl:template match="/">
                   <photo>
                      <xsl:for-each select="//*[@id='content-center-well']">
                        <date><xsl:value-of select="div[@class='date']"/></date>
                        <previous>http://photography.nationalgeographic.com ¬
    <xsl:value-of
    select="div[@class='slide-navigation'][1]/p/a/@href"/>
    </previous>
                        <xsl:for-each select="div[@class='image-viewer clearfix']">
                          <xsl:for-each select="table/tbody/tr[1]/td/a">
                            <page>http://photography.nationalgeographic.com/photography/¬
    photo-of-the-day/<xsl:value-of
    select="substring-before(substring-after(@href,'enlarge/'),
    '_pod_image.html')"/>.html</page>
                            <xsl:variable name="href"
    select="concat('http://photography.nationalgeographic.com',
    substring-before(img/@src, '-ga.jpg'))"/>
                            <location type='small'>
    <xsl:value-of select="$href"/>-ga.jpg</location>
                            <location type='medium'>
    <xsl:value-of select="$href"/>-sw.jpg</location>
                            <location type='large'>
    <xsl:value-of select="$href"/>-lw.jpg</location>
                            <location type='wide'>
    <xsl:value-of select="$href"/>-xl.jpg</location>
                          </xsl:for-each>
                                                  
                          <xsl:for-each select="div[@class='summary']">
                            <title><xsl:value-of select="h3"/></title>
                            <credit><xsl:value-of select="p[@class='credit']"/></credit>
                            <description>
                              <xsl:copy-of select="div[@class='description']/node()"/>
                            </description>
                          </xsl:for-each>
                        </xsl:for-each>
                      </xsl:for-each>
                    </photo>
                </xsl:template>
                </xsl:stylesheet>
              ]]></stylesheet>
            </xslt>
          </var-def>
        </config>;

    Again, fairly straightforward – the <xslt> task has two inputs, <xml> and <stylesheet>.  The stylesheet unfortunately has to be enclosed in a CDATA section rather than as straight XML.  One other nice trick though – when the output is an XSLT template, the “omit-xml-declaration” flag can be used to strip off the XML declaration so we don’t have to do it through text manipulation, simplifying and accelerating our Javascript code.

    So we’re almost there with this capability.  Some minor improvements and adding caching are all we need:

    1. Add an optional  “url” parameter to allow this page to work on any photo-of-the-day page URL.  Using E4X’s curly braces we can substitute this value right into the config file.
    2. If the result was successful (e.g. the <photo/> element has children), calculate the date in yyyy-mm-dd format and use the storexml service to cache it – choosing a path unlikely to conflict with other users of the storexml service.  To make the storexml service easy to call, we import it’s stub, which I got from http://localhost:7762/services/system/storexml?stub&lang=e4x&localhost=true and saved into the nationalgeographic.resources folder which serves as the sandbox for this service.  It’s important to save a copy because when the Mashup Server boots up the nationalgeographic service might be deployed before the storexml service – attempts to generate the stub at that time will fail and cause the nationalgeographic service to fail too.  The Mashup Server doesn’t yet track these dependencies (and we’re still thinking about whether this is a tractable problem or not.)
    3. Since this operation isn’t really supposed to be called by end-users of the feed, I could make it private using scrape_picture_page.visible = “false”, but instead I’ve just used the “operationName” property to rename it, indicating to users that it really is just for test purposes.
    4. Add type annotations.
    5. Add documentation annotations (not shown below.)
    system.include("storexml.stub.js");
    var cachePath = "nationalgeographic/cache/";
     
    scrape_picture_page.operationName = "test_scrape_picture_page";
    scrape_picture_page.inputTypes = {"url" : "xs:string?"};
    scrape_picture_page.outputType = "xml";
    function scrape_picture_page(url) {
      if (url == null)
        url = "http://photography.nationalgeographic.com/photography/photo-of-the-day";
      var config =
        <config>
          <var-def name='response'>
            <xslt>
              <xml>
                <html-to-xml>
                <http method='get' url={url} />
                </html-to-xml>
              </xml>
             <stylesheet>
    ...
              </stylesheet>
            </xslt>
          </var-def>
        </config>;
     
      var scraper = new Scraper(config);
     
      var result = new XML(scraper.response);
      
      if (result.hasComplexContent()) {
        var date = xsDate(new Date(result.date));
        storexml.store(cachePath + date, result);
      }
     
      return result;
    }
    xsDate.visible = false;
    function xsDate(d)
    {
      return d.getUTCFullYear() + "-" +
           (d.getUTCMonth() < 9 ? "0": "" ) + (d.getUTCMonth() + 1) + "-" +
             (d.getUTCDate() < 10 ? "0": "" ) + d.getUTCDate();
    }

    As an aside, this shows a couple of my wishes:

    1. <xml> is a reserved tag name in XML, it’s unfortunate that Web Harvest doesn’t use something else.
    2. Web Harvest’s requirement that the stylesheet be enclosed in a CDATA section is unfortunate – it means well-formedness errors can’t be caught at Javascript/E4X compile time, but at runtime.  This slows down the development process.  I could put the stylesheet in a separate file, but that just makes it harder to share the service and see what’s going on.
    3. I’d prefer a way to get E4X XML back from Web Harvest directly so I wouldn’t have to parse it myself, worry about the XML Declaration, and so forth.  Maybe we can do something about this in a future release.
    4. Managing date formats becomes a bit of a chore.  I prefer the operations and cache to work on xs:date format (yyyy-mm-dd), but the page metadata is in the form “Month day, year” (directly from the scraped page).  And Javascript prefers to manipulate dates in its own Date object.  Soon we’ll see that the RSS profile defines a subset of the Javascript serialization that means a fourth conversion.

    Finding a picture for a particular date

    Now that we have a function that can scrape a page given a URL, and given that the data returned and cached by that function contains a link to the page for the previous day’s page, we can do some walking around in the cache to find data for a particular date.  That’s what this function does.

    First, we look in the cache for a photo’s metadata.  If it’s there, we can simply return it – we’re done.  Otherwise we need to find the URL for the page representing that date and call the scrape_picture_page operation.

    If I can’t find the requested date in the cache, I look for the next earlier date, and so on, until I do find a photo in the cache (or I reach today’s date).  That’s the first while loop.  Then, using the <previous> page url, I work backward again, incidentally populating the cache as I go, until I’m back to the date I was looking for.  The couple of “if” statements look for exceptional conditions: the first one handles the case where I’ve looked all the way forward till today but still haven’t found anything in the cache, and the second makes sure that if a page can’t be scraped for some reason that we give up and return what little we have before we dig ourselves any deeper.

    picture_for_date.inputTypes = {"date" : "xs:string"};
    picture_for_date.outputType = "xml";
    function picture_for_date(date) {
      try {
        return storexml.retrieve(cachePath + date);
      } catch (e) {
        print("failed to find cached photo for date " + date);
        var photo;
        var startDate = parseDate(date);
        var today = new Date();
        // work forwards in the cache until we find something (or hit today)
        while (startDate <= today) {
          try {
            photo = storexml.retrieve(cachePath + xsDate(startDate));
            break;
          } catch (e) {
            startDate.setUTCDate(startDate.getUTCDate() + 1);
          }
        }
        // start with the most current thing in the cache (if any) an work
    // backwards to the requested date, filling in the cache as we go...
        var targetDate = parseDate(date);
        while (startDate > targetDate) {
          var previousPageUrl;
            if (photo == null) previousPageUrl = null;
            else previousPageUrl = photo.previous;
          
            print("fetching photo for " + startDate);
            photo = scrape_picture_page(previousPageUrl);
            if (!photo.hasComplexContent())
              break;
            startDate.setUTCDate(startDate.getUTCDate() - 1);
        }
          
        return photo;
      }
    }

    Generating the feed

    Now we have all the pieces in place to aggregate the data and generate a list of some kind as output.  The picture_of_the_day operation does that for us.

    The function has some parameters controlling aspects of the feed – whether to link to the small, medium, large, or wide aspect ratio images, and how many items to include.  If no number is specified, we generate a feed of the latest 30 photos – just long enough to enjoy the photo but not so long we get tired of it.

    The WSO2 Mashup Server has a Feed object to help construct feeds, but because I’m targeting this feed at the Google Photos Screensaver I need to include some feed extensions that aren’t supported in the 0.2 release (though they’ve just been added to the nightly build!).  It’s not hard to create an RSS by hand though, so that’s what I chose to do.  First I prepopulate the channel with title, links, and description, and then loop through the photos adding an item for each of them.  The first time through the loop, I also add in a <pubDate> reflecting the date of today’s photo.

    Again, this isn’t rocket science – the hardest thing is simply to format the dates appropriately.  During the loop I use Javascript Date objects to increment days and tick over at the end of the month.  I convert that to an xs:date to access the cache, to an RSS Profile-conformant string for the <pubDate>, and to an xs:dateTime for use in the <atom:published/> element, which seems useful for the subscription page displayed in Internet Explorer 7.

    picture_of_the_day.inputTypes = 
    {"size" : "small | medium | large | wide", "numPhotos" : "number?"};
    picture_of_the_day.outputType = "#raw";
    function picture_of_the_day(size, numPhotos) {
      if (numPhotos == null) numPhotos = 30;
     
      var feed =
        <rss version="2.0">
          <channel>
            <title>National Geographic Picture-of-the-day (from WSO2 Mashup Server)</title>
            <link>http://mashups.wso2.org/services/nationalgeographic/¬
    picture_of_the_day?size={size}</link>
            <description>WSO2 Mashup Server mashup acquiring and caching links to the ¬
    National Geographic Picture of the Day
    (http://photography.nationalgeographic.com/photography/picture-of-the-day),
              and exposing them as a feed.  Sizes of "small", "medium", "large", and "wide"
    are available. A max number of photos can be specified with the "numPhotos"
    parameter.</description>
          </channel>
        </rss>;
     
      var startDate = new Date();
      var photo, photoDate, url, urlsmall, entry;
      for (var i = 0; i < numPhotos; i++) {
        photo = picture_for_date(xsDate(startDate));
        if (photo.hasComplexContent()) {
          url = photo.location.(@type == size).toString();
          urlsmall = photo.location.(@type == 'small').toString();
          photoDate = new Date(photo.date.toString());
          if (i == 0) {
            feed.channel.appendChild(<pubDate>{rssDate(photoDate)}</pubDate>);
          }
          entry = <item xmlns:media="http://search.yahoo.com/mrss/">
              {photo.title}
                    <description>
                      &lt;a href='{url}'>&lt;img src='{urlsmall}'/>&lt;/a>
                      {photo.description.*.toXMLString()}
                    </description>
                    <pubDate>{rssDate(photoDate)}</pubDate>
                    <link>{photo.page.toString()}</link>
                    <guid isPermaLink='false'>{photo.page.toString()}</guid>
                    <media:content url={url} type="image/jpeg" />
                    <media:thumbnail url={urlsmall} />
                    <atom:published xmlns:atom="http://www.w3.org/2005/Atom"
    >{xsDate(photoDate)}T00:00:00Z</atom:published>
                  </item>;
          feed.channel.appendChild(entry);
        }
        startDate.setUTCDate(startDate.getUTCDate() - 1);
      }
      return feed;
    }

    You can access this operation through the try-it page at http://localhost:7762/services/jonathan/nationalgeographic?tryit and see that the operation returns a feed.  However, the try-it uses SOAP by default under the covers, which isn’t terribly friendly to feed readers like the Google Photos Screensaver.  No problem – the Mashup Server also exposes it’s operation through a REST interface.  By accessing the URL http://localhost:7762/services/jonathan/nationalgeographic/photo_of_the_day?size=wide, you can see the feed directly in the browser, point the screen saver at it, subscribe to it, etc.  By adjusting the “size” and “numPhotos” parameters you can generate variants of the feed that suit your purpose.

     

    Publishing the feed

    Once I had the service written, tried it for a day or two to ensure it was stable (and fixed a couple of edge cases as a result), I used the administrative UI in the Mashup Server to publish it to http://mooshup.com, which hosts the service live on the internet for others to use.  The publishing process is simple – click the share button, confirm that http://mooshup.com is the destination, and click OK.  While we have lots to do to make this site an attractive and useful place for members of the mashup community to hang out, it does give me a stable internet URL for the feed (for example http://mooshup.com/services/jonathan/nationalgeographic/picture_of_the_day?size=wide) so others can enjoy it.  You can exercise the try-it page live from there, look at the metadata, or download the service to your local installation of the Mashup Server and run it there.

     

    Last Word

    Hopefully this helps you get a feel for the Mashup Server in action.  We did some screen scraping, fairly sophisticated caching by invoking an external storexml Web service, formulated an RSS feed, and made it (and intermediate functions) available through a Web service including SOAP 1.2, SOAP 1.1, and HTTP bindings, including an HTTP GET binding amenable to RSS agents.  Although we didn’t look at them in detail in this article, the Mashup Server generated a try-it page for debugging and exercising the service, WSDL, Schema, stubs for accessing the service simply from Javascript or E4X environments, even generated some human-readable documentation for the mashup.  We ran the service locally, then published it live onto the internet.   It also would not be hard to generate a custom HTML interface providing (for example) a slideshow of these photos, but in this case I wanted to show that user interfaces can go beyond just HTML pages by using Google Photos Screensaver as my ultimate user interface.

    So what’s next for this service?  The main improvement I can think of is rewriting the code to use the Feed object when it becomes capable of handling the images.  It took me a while to figure out which RSS extensions were necessary and it would be nice not to worry about the representation of dates.  Maybe I could even offer an Atom feed in parallel.  Another idea related to performance would be to experiment with a different, perhaps additional, caching strategy – which is to cache the entire feed to disk and periodically refresh it using the recurrence capabilities of the mashup server.  But those are perhaps good topics for future articles!

    Until then, enjoy the great photos available from National Geographic!

    [Updated 6 Feb 2008 - added "jonathan" user to endpoint urls as required by the Mashup Server 1.0 release, and changed the online links to point to http://mooshup.com.]

    October 09

    WSO2 Mashup Server 0.2 Released

    The WSO2 Mashup Server 0.2 release is now available for download!  Right on schedule three months after our 0.1 release.

    As I said on the advent of the 0.1 release, the approach we've taken to the Web Service composition space is simple:

    • Provide a platform for easy invocation of Web Services from within a JavaScript environment.  This enables you to grab data in XML format from a variety of Web Services and manipulate it fluidly, using XML as a native datatype courtesy of E4X support.
    • Expose JavaScript functions as Web Service operations, complete with WSDL 2.0 or WSDL 1.1 and XML Schema descriptions, expose them through SOAP 1.2, SOAP 1.1, and REST, and generate a host of artifacts including Javascript/DOM and Javascript/E4X stubs, a try-it page, and so forth.
    • Provide bridges to information not in strict Web Service format (e.g. described by WSDL), including at screen scraping tools (for HTML and other non-XML dialects), feeds (RSS and Atom), and files.

    The end result is a scriptable Web Services composition platform.

    We didn't raise much noise around the 0.1 release, as we still were working on some of the fundamentals.  But I'm very proud of the 0.2 release and encourage you to give it a whirl.  This release marks major improvements in a number of areas:

    • Revamped and expanded support for describing and converting types.
      • In addition to support for annotating a Javascript parameter with an XML Schema built-in types, the Javascript types themselves can be "declared".  We've developed mappings (and conversions) from Javascript types.
      • Support for declaring optional parameters.
      • Support for declaring repeated parameters (arrays).
      • Support for declaring enumerations.
      • Support for declaring simple object structures.
      • Run-time annotations of types enabling Javascript types to be serialized as XML and reconstituted at the client (in the absence of annotations.)
    • Support for long running services, simple workflows, and periodic invocation based on simple and familiar browser scheduling constructs:
      • setTimeout()
      • setInterval()
    • Improved ATOM/RSS and APP support
    • Lots of bug fixes and improvements.

    One area we didn't do any innovation on was our user interface - which still has a number of useability and functionality issues.  We held off incremental improvements between 0.1 and 0.2 in order to focus on a significant revamp in 0.3.

    I'll be talking more about specific features and use cases of the WSO2 Mashup Server in weeks to come.

    View more entries
     
    Thanks for visiting!
    • March 11 12:20 PM
       : )  Nice page
      Keep up