Archive for the ‘Blog’ Category

Everything I know about cataloguing I learned from watching James Bond

Thursday, March 11th, 2010

At VALA2010 I did a presentation titled ‘Everything I know about cataloguing I learned from watching James Bond’. What I was trying to explore was the notion of how searching for objects is changing. We are now so used to full text search for books, journals and newspapers that the traditional forms of metadata, such at title, author and date have become secondary research items.

For other collection formats like images or audio recordings, this traditional metadata is still the main method of discovering items. What I wanted to look at was the concept of a full text search for images. To do this I carried out some experiments in facial recognition and colour analysis over the photographic collection of the National Library of Australia.

Here are the slides of my presentation and a link to the search by colour application I developed as part of my research.

YQL mashups for libraries

Wednesday, December 9th, 2009

In October GovHack was held in Canberra. I went along as a participant, but also to advise any teams on the use of the National Library of Australia’s API’s. One of the things I spent my time doing there was to make some YQL Open Data Tables for some of the Library’s services. Why is this interesting? Let’s go back a few steps.

YQL is a service from Yahoo that provides a SQL like environment for querying, filtering and joining web services. So instead of having to write a complex URL to access data from a website, we can use YQL to write a statement that is similar to an SQL query that we might use to obtain data from a MySQL database, except, instead of querying a database, we are querying a web service. As an example, you can enter the following into the YQL console to extract photos of the Sydney Harbour Bridge from Flickr:

SELECT * FROM flickr.photos.search WHERE text="sydney harbour bridge";

When YQL was launched it initially had options to query only Yahoo’s services. If you wanted to query a web service that was outside of Yahoo’s services you were out of luck. Since then Yahoo has allowed developers to build YQL Open Data Tables. An Open Data Table is an XML file that acts as a bridge between your API the YQL language and you describe how your API is structured in terms that YQL can understand.

If we wish to use an API to return data from one of the Library’s services, say Picture Australia, we can query it using the following URL:

http://librariesaustralia.nla.gov.au/apps/kss?action=OpenSearch&targetid=pictaust&searchTerms=Sydney+Harbour+Bridge&startPage=1

As you can see, it starts to become a fairly complex URL with a lot of querystring values to point towards where we need to extract the data from.

Now let’s create that same query using YQL. Firstly I created an Open Data Table for Picture Australia. This is the key component that ties Picture Australia and YQL together. If you now enter the following into the YQL console & you’ll get back an XML feed from Picture Australia for the pictures of the Sydney Harbour Bridge.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT * FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1";

Alternatively you can query The National Library of Australia’s catalogue for pictures of the Sydney Harbour Bridge by using this Open Data Table and entering the following term into the YQL console:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor="sydney harbour bridge {format:Online AND format:Picture}";

So how is this interesting? Can’t all of this information already be gathered from our standard API’s? There are a couple of advantages to using YQL. One advantage is being able to extract just portions of the data. Say you want to extract just the title, description and persistant URL of the records and you only want to return the first 3 items, you can just enter:

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT title,description,link FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 3;

or you could just extract a link to where the most relevant original item is stored.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT enclosure.url FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 1;

This starts to give you a bit of flexibility in the fields and amount of data that is returned and limit the amount of parsing that you have to do. All the hard work is being done by the servers at Yahoo.

But the really fun stuff starts when you try to create a little mashup by combining data from different services. Let’s use YQL to find the current number 1 artist at Yahoo’s music service:

SELECT name FROM music.artist.popular LIMIT 1;

We can now easily combine this search with a search for the top 5 items from or about that artist in the National Library’s catalogue:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor IN (SELECT name FROM music.artist.popular LIMIT 1) LIMIT 5;

Once we have constructed this query, we can access that using a JSON-P call and use a little bit of JavaScript to display the results within a web page (see example 1).

<div id="nla"></div>
<script type="text/javascript">
function nlabooks(o){
  var f = document.getElementById('nla');
  var out = '<ul>';
  var books = o.query.results.item;
  for(var i=0,j=books.length;i<j;i++){
    var cur = books[i];
    out += '<li><a href="' + cur.link + '">'+ cur.title +'</a></li>';
  }
  out += '</ul>';
  f.innerHTML = out;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=USE%20%22http%3A%2F%2Fwww.paulhagon.com%2Fyql%2Fnla.xml%22%20AS%20nla%3B%0ASELECT%20*%20FROM%20nla%20WHERE%20lookfor%20IN%20(SELECT%20name%20FROM%20music.artist.popular%20LIMIT%201)%20limit%205%3B&format=json&diagnostics=false&callback=nlabooks"></script>

We’ve now got a little widget that we can use inside any page to dynamically mashup 2 separate data sources.

If we were to do that in a traditional manner we would have to be writing two separate calls to the web services and possibly parsing the results in different ways. By using YQL, all that hard work can be carried out in a minimal amount of code.

Building these tables was as much a case of learning a bit more about YQL and the possibilities that it can offer. What I’ve shown here is a simple demonstration at the ease with which you can use services like YQL to expand your data to a wider audience.

Note: Please don’t build any mission critical applications using these data tables – they are only there for demonstration purposes. I’ll hopefully make them more permanent and hosted on the National Library’s servers.

Immediate sharing

Sunday, September 27th, 2009

This week the east coast of Australia was blanketed in a dust storm. The worst day was on Wednesday the 23rd when Sydney was blanketed in errie red dust. The social networks were bombarded with people’s accounts of the events.

I decided to do a little analysis of how quickly people reacted to the event & how quickly they shared their experiences of the event. Using the Flickr API I exported all the photos that had been taken on the 23rd of September that had been tagged with Sydney and dust. I then looked at how long it took people to upload the photos. Out of these photos I removed those photos where the user wasn’t displaying the EXIF metadata and those where the camera time was obviously set incorrectly (where the time the photo was taken was later than the time it was uploaded).

Time to upload days

The bulk of the photos were uploaded to Flickr within 24 hours of being taken, with very few photos being uploaded 2 or 3 days after being taken. It was an immediate action. I then looked in more detail at what happened with those photos that were uploaded within 24 hours of being taken.

Time to upload hours

51% of photos were uploaded to Flickr within 4 hours of when they were taken. Given the time of day when the dust storms were happening, as people were going to work, there is also a small increase in the number of photos being uploaded 10-15 hours later, which corresponds time wise to people uploading images later that evening when they arrived home from work, quite possibly the first opportunity they would have had to upload their images.

I also did some analysis on those photos that were uploaded in the first 4 hours of being taken. Did this immediacy relate to the type of camera used?

Camera type

24% of images didn’t have the model of camera recorded in their EXIF metadata. What was surprising was that only 6% of these rapidly uploaded images came from mobile devices like iPhone and Nokia mobile phone cameras. Over 50% of images came from digital SLR cameras while the remainder were mostly compact cameras.

This demonstrates a desire for us to be wanting to immediately share what is happening in our environment with a wider audience, but we aren’t sharing it using our mobile devices.

Common Ground

Wednesday, September 23rd, 2009

I’m really excited to be playing a small part in the upcoming Common Ground meet up to be held on the 2nd-3rd October 2009. Common Ground is a global meet up celebrating the Commons on Flickr to be held by as many of the institutions in the Commons as possible. The institutions will be projecting images from the Commons onto their buildings at night. In keeping with the community based spirit of Flickr & The Commons, the images have all been chosen by the public.

I’ve cast my vote on the images I would like to see and will head to Sydney to the Powerhouse Museum to sit with friends and watch the slideshow. On the night I’ll be giving a brief presentation on some of the work I’ve been doing using The Commons.  On Sunday 4th October, I’ll be giving a presentation with Paula Bray at the Powerhouse Museums Talks After Noon, where I’ll talk about what Flickr Commons means to me, show some of the things I’ve done with it and most importantly discuss the power of the community.

Come along, it will be a great night.

Using VLC for screen capture

Monday, July 27th, 2009

I recently had to make a screen capture for embedding into a Keynote presentation. As this sort of thing isn’t something I do often I didn’t really want to pay money for a program to do it. Luckily, something totally unrelated happened and I found the solution. VLC, the open source media player was updated to version 1.0. It was one of those programs I had always had installed, but only ever used to play media files. As I was exploring the menus to see what was new, I discovered it was easy to be able to record your screen. Here are some instructions on how to do it.

First you’ll need to download and install VLC.

Open VLC and select File -> Streaming/Exporting Wizard…

1-file menu

A wizard appears. You will want to select Transcode/Save to file
2-streaming wizard

Choose Select a stream. As we want to record the screen, in the input box enter screen://
3-choose input

Choose the format and the bitrate. As I’m using a Mac I’ll choose H.264 and select a suitable bit rate. If you have audio you wish to include you can select the audio option as well.
4-choose format

Select the method of encapsulation – for H.264 it’s good to select MPEG 4/MP4.
5-encpsulated

Select where to save your movie.
6-save to

You’ll then be presented with a summary screen so you can check all the settings. If they are OK, click on Finish and VLC will start to record your screen (including the mouse movements).
7-summary

Once you have finished recording, switch back to VLC and click the stop button. The recording will stop and the file is saved.
8-stop recording

You will probably need to trim the start and the end of the screen capture in another program as the screen capture will record you switching back and forth between VLC to hit the stop button. For instance, if you are using the screen capture in a Keynote presentation you can import the movie into your presentation and then move the Start and Stop markers in the QuickTime inspector to remove these portions.
9-Trim

You’ll now have a beautifully created screen capture movie for your presentation. I hope you find this tip useful.