Archive for the ‘Libraries’ Category

Trove and the Australian Women’s Weekly

Tuesday, November 23rd, 2010

Yesterday the National Library of Australia launched fully searchable digitsed versions of the Australian Women’s Weekly. It’s a fantastic resource for searching, but I find the interface a little time consuming for just browsing through issues & looking at the stories, images and advertisements. Luckily there is a very simple alternative.

Step 1. When you are viewing an issue there is an option to download the issue as a PDF.

Screen shot showing the Australian Women's Weekly

Step 2: Once the PDF has downloaded drag it to iTunes so it is added to your books (of course you can edit the metadata to something more appropriate than the blank default version).

iTunes Books

Step 3: Sync your iPad with iTunes & now you have a copy of the issues in iBooks on your iPad (or iPhone). A much nicer browsing experience. You can swipe from left to right to change pages and pinch to zoom in and out using all those lovely interactions we’ve become so used to.

iPad

It feels like a bit of a hack, but if the option is there to do it, and if it’s so easy to do, why not.

Colours of a tag

Friday, May 14th, 2010

I’ve been expanding upon the experiments I presented at VALA earlier this year where I built a search by colour application for the National Library of Australia. Out of curiosity I built the same search by colour application using approximately 35,000 images from Flickr Commons.

Since building these applications I’ve been wondering, do certain topics (or tags) also relate to a colour? Does a search for Paris return the colourful images your imagination expects? Are images tagged with red really red?

With a bit of help from the Flickr API, I’ve built an application that queries the 50 most interesting Flickr Commons images for a particular tag, and displays the colours of these images. It also attempts to create a definitive colour for the tag by averaging the colours out.

As you explore the tags more & more you tend to find that most tags return an average muddy brown colour. I suspect this is partly to do with many of the images being black & white & skewing the process.

It’s really interesting to explore a few different subjects and seeing what results appear.

Formats

Can we find an colour gamut for a format?

Cities and countries

Do different cities or countries have different colours associated with them?

Objects

Do objects have particular colours associated with them? Take a bridge. Why do bridges exist? They exist to allow us to go over a river or a valley. With that logic we should expect photos tagged with bridge to have a reasonably large amount of green or blue in the image.

Sure enough, we get quite a few images with green and blue in them.

Colours

Of course colours are a natural subject to test.

Blue

Green

Red

Yellow

Have a go

Feel free to explore the application and find some interesting results. The URL is totally hackable if the tag you want to test isn’t part of the initial tag cloud.

Gallipoli Twitter

Friday, April 23rd, 2010

For the past few months I’ve been following a blog set up by the Australian War Memorial where they are recreating the diary of Herbert Vincent Reynolds by posting the entires from his diary on the days they were written. Herbert Vincent Reynolds enlisted in the First World War with the 4th Field Ambulance and went on to serve at Gallipoli.

One thing I’ve noticed about reading the blog posts is how similar they are to Twitter posts. Many of the entries are very short and the manner in which they are written is typical of what you would find in a tweet. I went back through the diary entries to analyse their content and measure the number of characters in each entry. The average number of characters per diary entry between 2nd Feb 1915 and 21st April 1915 was 342 characters. The longest diary entry so far has been 4066 characters long, but many of the entries are less than 250 characters, and really are just short snippets of information about the events of the day. They aren’t beautifully written entries.

Reading through the diary I’m convinced that if Herbert Vincent Reynolds had access to Twitter back in 1915, he would have used it to post his diary entries. The similarities in the writing styles and structure in the methods of communication nearly a hundred years apart is uncanny. It’s also interesting to note that the Australian War Memorial is using their Twitter feed to promote the diaries.

Everything I know about cataloguing I learned from watching James Bond

Thursday, March 11th, 2010

At VALA2010 I did a presentation titled ‘Everything I know about cataloguing I learned from watching James Bond’. What I was trying to explore was the notion of how searching for objects is changing. We are now so used to full text search for books, journals and newspapers that the traditional forms of metadata, such at title, author and date have become secondary research items.

For other collection formats like images or audio recordings, this traditional metadata is still the main method of discovering items. What I wanted to look at was the concept of a full text search for images. To do this I carried out some experiments in facial recognition and colour analysis over the photographic collection of the National Library of Australia.

Here are the slides of my presentation and a link to the search by colour application I developed as part of my research.

YQL mashups for libraries

Wednesday, December 9th, 2009

In October GovHack was held in Canberra. I went along as a participant, but also to advise any teams on the use of the National Library of Australia’s API’s. One of the things I spent my time doing there was to make some YQL Open Data Tables for some of the Library’s services. Why is this interesting? Let’s go back a few steps.

YQL is a service from Yahoo that provides a SQL like environment for querying, filtering and joining web services. So instead of having to write a complex URL to access data from a website, we can use YQL to write a statement that is similar to an SQL query that we might use to obtain data from a MySQL database, except, instead of querying a database, we are querying a web service. As an example, you can enter the following into the YQL console to extract photos of the Sydney Harbour Bridge from Flickr:

SELECT * FROM flickr.photos.search WHERE text="sydney harbour bridge";

When YQL was launched it initially had options to query only Yahoo’s services. If you wanted to query a web service that was outside of Yahoo’s services you were out of luck. Since then Yahoo has allowed developers to build YQL Open Data Tables. An Open Data Table is an XML file that acts as a bridge between your API the YQL language and you describe how your API is structured in terms that YQL can understand.

If we wish to use an API to return data from one of the Library’s services, say Picture Australia, we can query it using the following URL:

http://librariesaustralia.nla.gov.au/apps/kss?action=OpenSearch&targetid=pictaust&searchTerms=Sydney+Harbour+Bridge&startPage=1

As you can see, it starts to become a fairly complex URL with a lot of querystring values to point towards where we need to extract the data from.

Now let’s create that same query using YQL. Firstly I created an Open Data Table for Picture Australia. This is the key component that ties Picture Australia and YQL together. If you now enter the following into the YQL console & you’ll get back an XML feed from Picture Australia for the pictures of the Sydney Harbour Bridge.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT * FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1";

Alternatively you can query The National Library of Australia’s catalogue for pictures of the Sydney Harbour Bridge by using this Open Data Table and entering the following term into the YQL console:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor="sydney harbour bridge {format:Online AND format:Picture}";

So how is this interesting? Can’t all of this information already be gathered from our standard API’s? There are a couple of advantages to using YQL. One advantage is being able to extract just portions of the data. Say you want to extract just the title, description and persistant URL of the records and you only want to return the first 3 items, you can just enter:

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT title,description,link FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 3;

or you could just extract a link to where the most relevant original item is stored.

USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT enclosure.url FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 1;

This starts to give you a bit of flexibility in the fields and amount of data that is returned and limit the amount of parsing that you have to do. All the hard work is being done by the servers at Yahoo.

But the really fun stuff starts when you try to create a little mashup by combining data from different services. Let’s use YQL to find the current number 1 artist at Yahoo’s music service:

SELECT name FROM music.artist.popular LIMIT 1;

We can now easily combine this search with a search for the top 5 items from or about that artist in the National Library’s catalogue:

USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor IN (SELECT name FROM music.artist.popular LIMIT 1) LIMIT 5;

Once we have constructed this query, we can access that using a JSON-P call and use a little bit of JavaScript to display the results within a web page (see example 1).

<div id="nla"></div>
<script type="text/javascript">
function nlabooks(o){
  var f = document.getElementById('nla');
  var out = '<ul>';
  var books = o.query.results.item;
  for(var i=0,j=books.length;i<j;i++){
    var cur = books[i];
    out += '<li><a href="' + cur.link + '">'+ cur.title +'</a></li>';
  }
  out += '</ul>';
  f.innerHTML = out;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=USE%20%22http%3A%2F%2Fwww.paulhagon.com%2Fyql%2Fnla.xml%22%20AS%20nla%3B%0ASELECT%20*%20FROM%20nla%20WHERE%20lookfor%20IN%20(SELECT%20name%20FROM%20music.artist.popular%20LIMIT%201)%20limit%205%3B&format=json&diagnostics=false&callback=nlabooks"></script>

We’ve now got a little widget that we can use inside any page to dynamically mashup 2 separate data sources.

If we were to do that in a traditional manner we would have to be writing two separate calls to the web services and possibly parsing the results in different ways. By using YQL, all that hard work can be carried out in a minimal amount of code.

Building these tables was as much a case of learning a bit more about YQL and the possibilities that it can offer. What I’ve shown here is a simple demonstration at the ease with which you can use services like YQL to expand your data to a wider audience.

Note: Please don’t build any mission critical applications using these data tables – they are only there for demonstration purposes. I’ll hopefully make them more permanent and hosted on the National Library’s servers.