Archive for the ‘Libraries’ Category

Everything I know about cataloguing I learned from watching James Bond

Thursday, March 11th, 2010

At VALA2010 I did a presentation titled ‘Everything I know about cataloguing I learned from watching James Bond’. What I was trying to explore was the notion of how searching for objects is changing. We are now so used to full text search for books, journals and newspapers that the traditional forms of metadata, such at title, author and date have become secondary research items.

For other collection formats like images or audio recordings, this traditional metadata is still the main method of discovering items. What I wanted to look at was the concept of a full text search for images. To do this I carried out some experiments in facial recognition and colour analysis over the photographic collection of the National Library of Australia.

Here are the slides of my presentation and a link to the search by colour application I developed as part of my research.

YQL mashups for libraries

Wednesday, December 9th, 2009

In October GovHack was held in Canberra. I went along as a participant, but also to advise any teams on the use of the National Library of Australia’s API’s. One of the things I spent my time doing there was to make some YQL Open Data Tables for some of the Library’s services. Why is this interesting? Let’s go back a few steps.

YQL is a service from Yahoo that provides a SQL like environment for querying, filtering and joining web services. So instead of having to write a complex URL to access data from a website, we can use YQL to write a statement that is similar to an SQL query that we might use to obtain data from a MySQL database, except, instead of querying a database, we are querying a web service. As an example, you can enter the following into the YQL console to extract photos of the Sydney Harbour Bridge from Flickr:

SELECT * FROM flickr.photos.search WHERE text="sydney harbour bridge";

When YQL was launched it initially had options to query only Yahoo’s services. If you wanted to query a web service that was outside of Yahoo’s services you were out of luck. Since then Yahoo has allowed developers to build YQL Open Data Tables. An Open Data Table is an XML file that acts as a bridge between your API the YQL language and you describe how your API is structured in terms that YQL can understand.

If we wish to use an API to return data from one of the Library’s services, say Picture Australia, we can query it using the following URL:

http://librariesaustralia.nla.gov.au/apps/kss?action=OpenSearch&targetid=pictaust&searchTerms=Sydney+Harbour+Bridge&startPage=1

As you can see, it starts to become a fairly complex URL with a lot of querystring values to point towards where we need to extract the data from.

Now let’s create that same query using YQL. Firstly I created an Open Data Table for Picture Australia. This is the key component that ties Picture Australia and YQL together. If you now enter the following into the YQL console & you’ll get back an XML feed from Picture Australia for the pictures of the Sydney Harbour Bridge.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT * FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1";

Alternatively you can query The National Library of Australia’s catalogue for pictures of the Sydney Harbour Bridge by using this Open Data Table and entering the following term into the YQL console:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor="sydney harbour bridge {format:Online AND format:Picture}";

So how is this interesting? Can’t all of this information already be gathered from our standard API’s? There are a couple of advantages to using YQL. One advantage is being able to extract just portions of the data. Say you want to extract just the title, description and persistant URL of the records and you only want to return the first 3 items, you can just enter:

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT title,description,link FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 3;

or you could just extract a link to where the most relevant original item is stored.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT enclosure.url FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 1;

This starts to give you a bit of flexibility in the fields and amount of data that is returned and limit the amount of parsing that you have to do. All the hard work is being done by the servers at Yahoo.

But the really fun stuff starts when you try to create a little mashup by combining data from different services. Let’s use YQL to find the current number 1 artist at Yahoo’s music service:

SELECT name FROM music.artist.popular LIMIT 1;

We can now easily combine this search with a search for the top 5 items from or about that artist in the National Library’s catalogue:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor IN (SELECT name FROM music.artist.popular LIMIT 1) LIMIT 5;

Once we have constructed this query, we can access that using a JSON-P call and use a little bit of JavaScript to display the results within a web page (see example 1).

<div id="nla"></div>
<script type="text/javascript">
function nlabooks(o){
  var f = document.getElementById('nla');
  var out = '<ul>';
  var books = o.query.results.item;
  for(var i=0,j=books.length;i<j;i++){
    var cur = books[i];
    out += '<li><a href="' + cur.link + '">'+ cur.title +'</a></li>';
  }
  out += '</ul>';
  f.innerHTML = out;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=USE%20%22http%3A%2F%2Fwww.paulhagon.com%2Fyql%2Fnla.xml%22%20AS%20nla%3B%0ASELECT%20*%20FROM%20nla%20WHERE%20lookfor%20IN%20(SELECT%20name%20FROM%20music.artist.popular%20LIMIT%201)%20limit%205%3B&format=json&diagnostics=false&callback=nlabooks"></script>

We’ve now got a little widget that we can use inside any page to dynamically mashup 2 separate data sources.

If we were to do that in a traditional manner we would have to be writing two separate calls to the web services and possibly parsing the results in different ways. By using YQL, all that hard work can be carried out in a minimal amount of code.

Building these tables was as much a case of learning a bit more about YQL and the possibilities that it can offer. What I’ve shown here is a simple demonstration at the ease with which you can use services like YQL to expand your data to a wider audience.

Note: Please don’t build any mission critical applications using these data tables – they are only there for demonstration purposes. I’ll hopefully make them more permanent and hosted on the National Library’s servers.

DigitalNZ location search

Thursday, June 18th, 2009

Over the past couple of months I’ve been building a little application using the API’s from the DigitalNZ project. DigitalNZ is a collaboration between government departments, publicly funded organisations, the private sector, and community groups to expose and share their combined digital content. Part of their plan to expose their data is to provide a publically available API for developers to expose their content in ways they may not have thought about.

Typically, a large dataset has a search box as it’s main interface. I wanted to get right away from that approach and create an engaging interface. This uses a map interface to allow the user to freely explore the content.

It currently uses a combination of API’s from Google and Flickr to convert a latitude and longitude from the map to obtain a place name. It then displays a shapefile from Flickr to approximate the area being searched, and returns a list of relevant results from DigitalNZ. Since I started work on this, the data returned from both of these API’s have been released under a Creative Commons license (Yahoo have released their geoplanet data and Flickr have release their shapefile data). I’ll end up incorporating these releases into the application rather than relying on the API’s for the functionality.

Explore the contents of DigitalNZ.

DigitalNZ

How libraries can learn from Twitter

Friday, May 29th, 2009

This morning an interesting Tweet arrived on a subject that I’ve been thinking about quite a lot lately:

there seem to be more people using twitter apps than twitter web. What is twitter doing wrong?
@katykat

In April 2008, ReadWriteWeb carried out a study How We Tweet: The Definitive List of the Top Twitter Clients that showed that only 56% of Twitter users used the web interface. My gut feeling tells me that that figure is lower now, given the growth of use of devices like the iPhone. 

This is a perfectly valid question to be asking in the context of a traditional website. But Twitter isn’t a website, it’s more than that, it’s a service like email. You are not restricted to interacting with your email via one particular method. Likewise, by building upon Twitter using their API’s you are not restricted to using their service in the one and only way that you can, you have choice in how you interact with their service. The important thing isn’t the website, it is the service. Twitter.com could basically become a one page website and as long as the API’s were maintained the service would continue as normal for much of the twitter community. The user has a variety of choices in interacting with the service based upon their personal preferences. They can choose the relevant application based upon interface they like and the features they are going to use.

Flickr, despite having a far greater number of API’s available, hasn’t followed the same path as Twitter. Most people still interact with Flickr via the standard web interface. This is mostly due to their terms of use which forbids people replicating the user experience of Flickr:

Use Flickr APIs for any application that replicates or attempts to replace the essential user experience of Flickr.com

Rev Dan Catt who up until recently worked at Flickr said:

I’ve often joked that I could probably get more stuff done working with the Flickr API outside of Flickr than inside.

 So to answer the question, I really don’t think Twitter is doing anything wrong, they are doing everything right.

What can Libraries learn from what Twitter and to a lesser degree Flickr, are doing? Can we start to think about our catalogue (or other core services) not as a website, but as a service. The website version of the catalogue may just be one aspect of the delivery mechanism for the information we wish to distribute. Why can’t we look at providing our services to our users in any way they wish to be able to interact with them?

Why can’t we provide specialised access to our catalogues to specific user groups, so they (or anyone) can create:

  • a simplified interface for high school users without all the complex features they don’t use
  • allow an historical society to create an application based upon their needs
  • an complex view of the catalogue for academics or librarians
  • a visual or geographic based search
  • a social network based around the catalogue

Institutions like the Brooklyn Museum and collaborative efforts such as DigitalNZ are providing their content to developers to do exactly this sort of thing. It’s very early days still and it will be interesting to see what starts to develop.

Let’s start thinking about interacting with the service, not the website.

New York then and now

Tuesday, January 6th, 2009

I’ve been playing around with yet another Flickr Commons then and now project, this time using the images of New York from 1935-1938 from the New York Public Library.  The process for this has been a little bit different to the previous then and now demonstrations.  The images that have been posted don’t have any geo-location metadata (a latitude or longitude) so they can’t be placed directly on a map in the same manner as other Commons photographs.  What they do have instead, is very good street addresses in their titles.

The google maps API has geocoding API call that translates a human readable address into a latitude and longitude.  So if we pass the title of a photo into the API – let’s say “Willow Street, No. 113, Brooklyn”, it returns the latitude and longitude of “40.6978614, -73.9955804″.

For the demonstration I’m using a KML file.  Generating this file is now a 2 step process, import the data from Flickr using their API, pass the title of the photo into the Google Maps API to get the latitude and longitude and merge both results into a KML file.

Of course some of the titles provide ambiguous addresses or don’t provide enough information and don’t automatically return a result.  for some of the images I’ve manually tweaked the data that I’ve passed into the geocoding API to obtain a result.  The results are by no means perfect, but it’s a pretty good demonstration of what can be achieved from very little data and automating everything.

Please explore my New York then and now mashup and let me know what you think.

New York then and now