Archive for 2009

YQL mashups for libraries

Wednesday, December 9th, 2009

In October GovHack was held in Canberra. I went along as a participant, but also to advise any teams on the use of the National Library of Australia’s API’s. One of the things I spent my time doing there was to make some YQL Open Data Tables for some of the Library’s services. Why is this interesting? Let’s go back a few steps.

YQL is a service from Yahoo that provides a SQL like environment for querying, filtering and joining web services. So instead of having to write a complex URL to access data from a website, we can use YQL to write a statement that is similar to an SQL query that we might use to obtain data from a MySQL database, except, instead of querying a database, we are querying a web service. As an example, you can enter the following into the YQL console to extract photos of the Sydney Harbour Bridge from Flickr:

SELECT * FROM flickr.photos.search WHERE text="sydney harbour bridge";

When YQL was launched it initially had options to query only Yahoo’s services. If you wanted to query a web service that was outside of Yahoo’s services you were out of luck. Since then Yahoo has allowed developers to build YQL Open Data Tables. An Open Data Table is an XML file that acts as a bridge between your API the YQL language and you describe how your API is structured in terms that YQL can understand.

If we wish to use an API to return data from one of the Library’s services, say Picture Australia, we can query it using the following URL:

http://librariesaustralia.nla.gov.au/apps/kss?action=OpenSearch&targetid=pictaust&searchTerms=Sydney+Harbour+Bridge&startPage=1

As you can see, it starts to become a fairly complex URL with a lot of querystring values to point towards where we need to extract the data from.

Now let’s create that same query using YQL. Firstly I created an Open Data Table for Picture Australia. This is the key component that ties Picture Australia and YQL together. If you now enter the following into the YQL console & you’ll get back an XML feed from Picture Australia for the pictures of the Sydney Harbour Bridge.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT * FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1";

Alternatively you can query The National Library of Australia’s catalogue for pictures of the Sydney Harbour Bridge by using this Open Data Table and entering the following term into the YQL console:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor="sydney harbour bridge {format:Online AND format:Picture}";

So how is this interesting? Can’t all of this information already be gathered from our standard API’s? There are a couple of advantages to using YQL. One advantage is being able to extract just portions of the data. Say you want to extract just the title, description and persistant URL of the records and you only want to return the first 3 items, you can just enter:

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT title,description,link FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 3;

or you could just extract a link to where the most relevant original item is stored.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT enclosure.url FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 1;

This starts to give you a bit of flexibility in the fields and amount of data that is returned and limit the amount of parsing that you have to do. All the hard work is being done by the servers at Yahoo.

But the really fun stuff starts when you try to create a little mashup by combining data from different services. Let’s use YQL to find the current number 1 artist at Yahoo’s music service:

SELECT name FROM music.artist.popular LIMIT 1;

We can now easily combine this search with a search for the top 5 items from or about that artist in the National Library’s catalogue:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor IN (SELECT name FROM music.artist.popular LIMIT 1) LIMIT 5;

Once we have constructed this query, we can access that using a JSON-P call and use a little bit of JavaScript to display the results within a web page (see example 1).

<div id="nla"></div>
<script type="text/javascript">
function nlabooks(o){
  var f = document.getElementById('nla');
  var out = '<ul>';
  var books = o.query.results.item;
  for(var i=0,j=books.length;i<j;i++){
    var cur = books[i];
    out += '<li><a href="' + cur.link + '">'+ cur.title +'</a></li>';
  }
  out += '</ul>';
  f.innerHTML = out;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=USE%20%22http%3A%2F%2Fwww.paulhagon.com%2Fyql%2Fnla.xml%22%20AS%20nla%3B%0ASELECT%20*%20FROM%20nla%20WHERE%20lookfor%20IN%20(SELECT%20name%20FROM%20music.artist.popular%20LIMIT%201)%20limit%205%3B&format=json&diagnostics=false&callback=nlabooks"></script>

We’ve now got a little widget that we can use inside any page to dynamically mashup 2 separate data sources.

If we were to do that in a traditional manner we would have to be writing two separate calls to the web services and possibly parsing the results in different ways. By using YQL, all that hard work can be carried out in a minimal amount of code.

Building these tables was as much a case of learning a bit more about YQL and the possibilities that it can offer. What I’ve shown here is a simple demonstration at the ease with which you can use services like YQL to expand your data to a wider audience.

Note: Please don’t build any mission critical applications using these data tables – they are only there for demonstration purposes. I’ll hopefully make them more permanent and hosted on the National Library’s servers.

Immediate sharing

Sunday, September 27th, 2009

This week the east coast of Australia was blanketed in a dust storm. The worst day was on Wednesday the 23rd when Sydney was blanketed in errie red dust. The social networks were bombarded with people’s accounts of the events.

I decided to do a little analysis of how quickly people reacted to the event & how quickly they shared their experiences of the event. Using the Flickr API I exported all the photos that had been taken on the 23rd of September that had been tagged with Sydney and dust. I then looked at how long it took people to upload the photos. Out of these photos I removed those photos where the user wasn’t displaying the EXIF metadata and those where the camera time was obviously set incorrectly (where the time the photo was taken was later than the time it was uploaded).

Time to upload days

The bulk of the photos were uploaded to Flickr within 24 hours of being taken, with very few photos being uploaded 2 or 3 days after being taken. It was an immediate action. I then looked in more detail at what happened with those photos that were uploaded within 24 hours of being taken.

Time to upload hours

51% of photos were uploaded to Flickr within 4 hours of when they were taken. Given the time of day when the dust storms were happening, as people were going to work, there is also a small increase in the number of photos being uploaded 10-15 hours later, which corresponds time wise to people uploading images later that evening when they arrived home from work, quite possibly the first opportunity they would have had to upload their images.

I also did some analysis on those photos that were uploaded in the first 4 hours of being taken. Did this immediacy relate to the type of camera used?

Camera type

24% of images didn’t have the model of camera recorded in their EXIF metadata. What was surprising was that only 6% of these rapidly uploaded images came from mobile devices like iPhone and Nokia mobile phone cameras. Over 50% of images came from digital SLR cameras while the remainder were mostly compact cameras.

This demonstrates a desire for us to be wanting to immediately share what is happening in our environment with a wider audience, but we aren’t sharing it using our mobile devices.

Common Ground

Wednesday, September 23rd, 2009

I’m really excited to be playing a small part in the upcoming Common Ground meet up to be held on the 2nd-3rd October 2009. Common Ground is a global meet up celebrating the Commons on Flickr to be held by as many of the institutions in the Commons as possible. The institutions will be projecting images from the Commons onto their buildings at night. In keeping with the community based spirit of Flickr & The Commons, the images have all been chosen by the public.

I’ve cast my vote on the images I would like to see and will head to Sydney to the Powerhouse Museum to sit with friends and watch the slideshow. On the night I’ll be giving a brief presentation on some of the work I’ve been doing using The Commons.  On Sunday 4th October, I’ll be giving a presentation with Paula Bray at the Powerhouse Museums Talks After Noon, where I’ll talk about what Flickr Commons means to me, show some of the things I’ve done with it and most importantly discuss the power of the community.

Come along, it will be a great night.

Using VLC for screen capture

Monday, July 27th, 2009

I recently had to make a screen capture for embedding into a Keynote presentation. As this sort of thing isn’t something I do often I didn’t really want to pay money for a program to do it. Luckily, something totally unrelated happened and I found the solution. VLC, the open source media player was updated to version 1.0. It was one of those programs I had always had installed, but only ever used to play media files. As I was exploring the menus to see what was new, I discovered it was easy to be able to record your screen. Here are some instructions on how to do it.

First you’ll need to download and install VLC.

Open VLC and select File -> Streaming/Exporting Wizard…

1-file menu

A wizard appears. You will want to select Transcode/Save to file
2-streaming wizard

Choose Select a stream. As we want to record the screen, in the input box enter screen://
3-choose input

Choose the format and the bitrate. As I’m using a Mac I’ll choose H.264 and select a suitable bit rate. If you have audio you wish to include you can select the audio option as well.
4-choose format

Select the method of encapsulation – for H.264 it’s good to select MPEG 4/MP4.
5-encpsulated

Select where to save your movie.
6-save to

You’ll then be presented with a summary screen so you can check all the settings. If they are OK, click on Finish and VLC will start to record your screen (including the mouse movements).
7-summary

Once you have finished recording, switch back to VLC and click the stop button. The recording will stop and the file is saved.
8-stop recording

You will probably need to trim the start and the end of the screen capture in another program as the screen capture will record you switching back and forth between VLC to hit the stop button. For instance, if you are using the screen capture in a Keynote presentation you can import the movie into your presentation and then move the Start and Stop markers in the QuickTime inspector to remove these portions.
9-Trim

You’ll now have a beautifully created screen capture movie for your presentation. I hope you find this tip useful.

Building location aware websites

Friday, July 24th, 2009

On the 24th of July I gave a presentation to the Canberra Web Standards Group on Building location aware websites. Here are the slides and notes from my presentation.

Slides 1-2
Welcome I’m Paul Hagon a web developer at the National Library of Australia. This is my twitter handle if you are twittering about my presentation while  I talk

Slide 3

Slides 4-5
Traditionally websites have required the user to make a choice about their location. This is stored in a cookie or within the user login.

Slides 6-9
There are applications where I don’t want to make the choice. I am travelling and in a different location and I want the information that is relevant to my current environment. A perfect example of this is the weather. I’m primarily interested in the weather and forecast for where I am.

Slide 10
The W3C geolocation group released their first working draft in late 2008. Their final recommendation is due to be released at the end of 2009. Their goal is to:

define a secure and privacy-sensitive interface for using client-side location information in location-aware Web applications

Slide 11
Location detection takes a variety of forms. The first form is an IP address lookup. If you are lucky this might give you the users location to the nearest town or state. It is generally fairly inaccurate. The next option is to determine the location of your wi-fi network router. If the user is on a cellular network, their location can be triangulated by  using the tower ID’s. These methods can be very accurate (to within a couple of hundred metres). The final method is to use a dedicated GPS chip and obtain a satellite fix. This is accurate to within a few metres.

Slide 12-13
Mobile phones started to have built in GPS chips, but it was really the iPhone that opened up the possibilities in this area. The problem was, the location sensors could only be accessed through dedicated iPhone applications written in Objective C. We are web developers and like angle brackets rather than square brackets. It’s a bit of a leap to go to a ‘proper’ programming language

Slide 14-15
Recently 2 developments took place. Firstly Firefox 3.5 was released. In amongst the newer JavaScript engine and native HTML5 audio and vide support, it also featured native geolocation functions. The iPhone operating system was also upgraded to OS 3.0 and with it, access to the iPhone location sensors was made available to mobile Safari.  Both of these implemementations followed the draft W3C guidelines. Native geolocation is also available within development builds of Opera and Fennec (mobile Firefox).

Slide 16
So where does this leave Internet Explorer (and the desktop version of Safari)? Users of these browsers can download Google Gears. This is typically used to offline access to things like gmail, Google docs etc. It also makes available some geolocation functions, although they are slightly different to the W3C recommendations.

Slide 17
A user can also use a service such as Fire Eagle to update their location, and this web service has an API that allows the data to be shared between sites (for example automatically updating your twitter location).

Slide 18-20
Privacy is a major concern. A user has to opt in to sharing their location with a web site. These services store IP addresses, the access point information and a unique identifier (for a period of 2 weeks). No identifiable information is passed to or stored by these services. You probably have in place something in your privacy policies to cover storing log files. We tend to know a fair bit of general location information about our users anyway from things like Google Analytics reports.

Slide 21
Users start broadcasting their location through services like Google Latitude or brightkite. This raises many more privacy issues, and they have options to allow a user to decide just how much information they wish to share.

Slide 22-30
The code to make it happen. Create a function that we can call from an event like a page load or event click. Make a location call. If the call is successful, extract the latitude and longitude. If it is unsuccessful (may not be able to get a signal or the service may not be able to resolver your location) do something else.

Slide 31
Reverse geogoding is the process of turing a latitude and longitude into a human readable form.

Slide 32-37
An example of a location aware application. It’s a mashup searching for photos in a particular location. Firstly in Safari (a non native geolocation aware browser) the user has to pick the location. In Firefox 3.5 (a native geolocation aware browser) the user can ask to be taken directly to their location. The browser asks for their permission before making the call. The location is accurate to a few hundred metres. Now some of the results aren’t totally accurate. It is searching via a name as there is very little location data in the records.

Slide 38
There are 3 instances of Parkes on the page – Parkes ACT, Parkes NSW and a name, Henry Parkes. It can’t differentiate between them

Slide 39
There is a service called Yahoo Placemaker where you can pass in data and it will return the geographic information for that data.

Slide 40
Passing in Parkes Australia we get the relevant geographic information for both locations of Parkes

Slide 41-43
Placemaker also accepts a URL as input. Lets pass some information from Open Australia into it. Open Australia is an application that allows users to see what their members of parliament have been doing. We could add location aware services to this to instantly be able to select the senator for the area we are currently in, or to find all the references to the area you are in, to see what decisions have been made that may have an effect on you.

Slide 44
Placemaker extracts the location names from the text of the page and returns any associated location data

Slides 45-48
Is this usable or is it still too cutting edge? iPhone usage is small (in the overall website usage), but users update quickly and have the capability to use location aware services. Firfox usage is also small, but as it has just been released it will take a little time to build up a user base. Firefox users tend to update rapidly. Users whose browsers have the capability to use location based services if they install Google Gears is more than 95% of our visitors.

Slides 49-50
I expect to see many more location based websites in the future. This presentation is available on slideshare & the references I’ve used are up on delicious. Thank you.