Archive for the ‘Libraries’ Category

The setup wrap up

Thursday, June 22nd, 2017

Wow! Yesterday I asked everyone who was participating in #blogjune to answer 4 questions about what they use to get their jobs done. There was a fantastic response to my request. I hope I got to read everybody’s posts, I did my best following on Twitter & seeing who linked back to my request.

I really enjoyed stopping and thinking about the tools I use. I kept my answers to those things I use on a very regular basis. There’s a mass of utilities that I use for specific purposes, but I didn’t start listing them as I thought it could go on and on. Maybe they are the really interesting things?

From the feedback I got, it seemed like you all enjoyed the task too. We ended up with a really good overview and insight into how we get things done. I feel like I know a little bit more about everyone, things that you don’t pick up in 140 characters. Thanks everyone!

Trove zone relevancy bubbles

Monday, June 19th, 2017

I often set myself little challenges to come up with a method that solves a problem or improves something (usually somehow related to something at work, or something from the GLAM sector). It will usually involve some technique, or programming feature that I’m trying to learn. Practical learning. In this case, I was looking at dynamically generating SVG files for some visualisation work and it took me on a bit of an unexpected journey. I thought I would talk through where this ended up – exploring the relevancy ranking of result zones in Trove and resulting in my Trove bubbles.

Bubble chart for Sydney Harbour Bridge

 

Some background to where this came from

When you go to Amazon and undertake a search, like most sites these days, you start to get autocomplete suggestions as you type. In this example below, when I search for headphones, there’s some clever mathematics going on behind the scenes that along with suggesting product titles for my term, it suggests the most relevant subject areas that relate to my term. In this case there’s higher relevance for headphones in Electronics than in Cell phones & accessories (or maybe a clothing option where there might be prints of headphones on a T-shirt).

Amazon autocomplete search for headphones

This search suggestion serves exactly the same purpose as a traditional website structure in trying to deliver the user to the right area of content on the site as quickly and easily as possible.

In Trove terms, lets relate these back to zones. When we undertake a search, we get presented with results for each zone and are given a number of results for each zone. What we aren’t given is how relevant each of these zones are. Each zone is presented with the same level of importance as every other zone regardless of the search term. As a designer, how can I change this so that I could present the most relevant zone for a search term to the user and potentially structure the page differently to do so & hopefully lead the user in the right direction?

Let me walk through a little experiment that shows how I might come to a solution to this problem.

When querying Trove through the API, one of the responses that is returned for each record in a result is a relevance score.

...
relevance: {
  score: "8.01584",
  value: "very relevant"
},
...

In the most simplest terms we could plot this relevancy score for the top results in each zone (by default this is 20 results per zone) on a chart to easily compare the difference between various search terms.

Relevance for Paul Hagon

Relevance chart for a search for “Paul Hagon”

Relevance for Sydney Harbour Bridge

Relevance chart for a search for “Sydney Harbour Bridge”

It becomes obvious that different searches deliver very different types of content as their top results. We’re starting to get an indicator as to what might be the most relevant zone for a query.

If we look at the chart above for the search on my name – what would the most relevant zone be? Would it be the archive zone that has 1 very relevant result and then very little or the picture zone that isn’t quite as highly ranked in relevance, but has a lot more content that appears relevant?

We could look at the basic statistical types of measures such as averages, means, standard deviations to come up with a figure. For my purposes, I’m going to stay with the chart I’ve generated and measure the area under the line to make my determination. This can easily be measured by calculating the area of a trapezoid for each result as it’s plotted and adding these together: (x + y)/2 * w

So we could use the following formula to calculate the area under the line (assuming the width of each trapezoid is 1):

(result 1 relevance score + result 2 relevance score)/2 + 
(result 2 relevance score + result 3 relevance score)/2 +
(result 3 relevance score + result 4 relevance score)/2 +
...

and so on until we get to result 20.

Chart showing trapezoid areas

Plot showing the areas of trapezoids to calculate the area under a line

If we return to the search for “Paul Hagon” we get results for areas of:

  1. picture: 47.425
  2. article: 44.195

We now have an answer that for this search, the most relevant result for this search is pictures, the zone with less relevant but more results, compared with the articles zone that has one highly relevant result and not a lot of other relevant results.

We could tailor the display of results to provide an emphasis on pictures and deliver the most relevant result.

 Moving beyond the maths

We live in an age of visualisations and so in addition to tailoring the display in a certain manner, we can expose these calculations to a visitor without bombarding them with the maths behind the result.

I love the UTS ribbon, that lives of on the catalogue of the library at UTS. This is a rainbow of dewey classifications for a result. It enhances your search results without taking away from the results themselves.

UTS ribbon

Could something similar be used to enhance the zones for Trove? We’ve already done all the maths for each item in the results – we know the averages of relevancy scores, the area under a graph, the standard deviation. Let’s combine some of this and turn it into something interesting. This is where my initial purpose of generating some dynamic SVG’s to visualise something came to life.

By plotting the average relevance of the zone on the x-axis and making the area of the bubble the same as the area under the line chart, we can create a simple little visualisation of the zone relevance breakdown. This provides a user with an indication of which zones are likely to provide the most relevant results for their search term. You can click through a range of sample searches below to see all the details about the search term & click on a bubble.

Bubble chart for Harry potter

Harry Potter

Bubble chart for Frank Hurley

Frank Hurley

Bubble chart for Sydney Harbour Bridge

Sydney Harbour Bridge

Bubble chart for Paul Hagon

Paul Hagon

Summary

So that is my Trove bubbles. From starting off looking at how to generate some SVG files, this lead to looking for something to visualise, which in turn lead to looking at Trove zone results. Sometimes it’s a very strange path that you take to learn something, but in the end, it’s not necessarily about the end result, it’s about the journey. The random discoveries you make along the way can be really fascinating.

Lost diggers

Monday, June 5th, 2017

In my previous post I spoke about owning your own content and controlling how it’s published. This post is something that demonstrates the value of having your content online. I’ve never written about this before, I don’t know why, it was very interesting to be part of.

At the start of 2010 I gave a presentation at VALA titled Everything I know about cataloguing I learned from watching James Bond (PDF). The presentation explored using facial recognition (using a sample set of photos of our Prime Ministers) and colour analysis as different ways to match and explore items within a collection. At the time it was pretty advanced, now, 8 years since the research took place, some of it seems pretty primitive as there’s been massive advances particularly in facial recognition and other forms of computer vision and artificial intelligence.

Facial detection success

A sample of facial detection from my original Prime Ministers data set

About 18 months after the presentation I had a phone call at work from the journalist Ross Coulthart. He had come across my paper from VALA and had an interesting opportunity on offer. Kerry Stokes AC had just purchased the Thuillier photographic collection of World War One soldiers that was discovered in a French farmhouse attic (now more commonly known as the lost diggers of Vignacourt). This collection consisted of over 3000 glass plate negatives of Australian, British, American, Canadian and other allied soldiers from the First World War.

Ross got in touch, because, they had these beautiful images, but there was no record of who the soldiers were in the pictures. They had built a website to try & crowd source details, but he thought there had to be a better way. Having come across my paper & seeing as I worked at the library, he wanted to see if we could use the techniques I had experimented with to possibly match these images to any images from our collections that might have names associated with the image. Of course, I jumped at the opportunity!

Being at the library, it was the perfect place, as I had access to Trove so it wasn’t just the National Library’s photo collection, it was essentially an Australian collection of photos. To extract a selection of images I looked at images with dates from 1910-1929 taking the dates wider than just the First World War as many images were taken prior to the war starting or from their return. This sample was then refined by extracting images with titles or descriptions that matched a list of military ranks:

  • general
  • brigadier
  • colonel
  • major
  • captain
  • lieutenant
  • cadet
  • sergeant
  • warrant officer
  • corporal
  • private
  • signalman
  • gunner
  • trooper
  • sapper
  • recruit
  • portrait
  • cpl
  • digger

Images were then run through a simple face detection algorithm to generate some level of accuracy indicator. This basically just returned the number of faces detected in an image. Many of the images were group portraits of units (so lots of faces detected). Getting great facial details was going to be much harder to detect compared to a head and shoulders portrait (where only 1 face would be detected). This also had the added bonus of eliminating false positive landscape photos that contained terms like “general view of…” in the title. In addition to military groups, it also brought back a lot of sporting group photos (as they listed the captain of the team in the description). I grouped these into portraitsgroups (military) and groups (sporting).

Portrait of soldier

More likely to get a good result

Group portrait of soldiers

Much less likely to get a good result

I then used this dataset to crawl the relevant institutions sites and download the highest resolution image that was available online (in many cases just a 450 or 600 pixel wide/tall image). Due to time constraints it would have been impractical to attempt to order high resolution scans of all the images.

This resulting dataset, contained just over 5,100 images.

At this stage, my contribution was done. Once I passed these images & data on to Seven, they used facial recognition services (from I believe a security firm) to compare and match these images to the glass plate negatives. Unfortunately, this process didn’t match any of the unknown soldiers with known soldiers. From what I can recall there were issues at the time with just using screen resolution images, with many of the images lacking the required detail for matching.

Despite not getting an outcome, it was a fascinating process to go through & was a really nice outcome that validated what I set out to do when undertaking the research for my paper.

It’s been so exciting to see what happened since I made my little contribution to the project, it went on to great things: there was a feature on the Sunday Night TV show, a book, an exhibition and permanent home for the collection at the Australian War Memorial.

Now, 6 years on from when the initial work was done, if it was done again it might provide different results. Facial recognition software has improved so much. Many institutions are also delivering much higher resolution images on their sites than they were back then.A combination of just these 2 items might prove to be more successful for other projects embarking on trying to solve a similar problem. If I was to undertake the same problem today, I would probably just load a data set into my phone & let the photos app analyse everything. I could simply do a search for “soldier”.

socks

Object recognition in the photos app on an iPhone

The moral of the story, put your ideas and thoughts online. You never know who will discover them and what they might lead to.

Own your own content

Friday, June 2nd, 2017

One of the housekeeping issues I need to fix on my site is fixing my presentations…

Once upon a time, SlideShare was the go to place for hosting presentations. It was a great idea, worked really well and allowed you to share and access presentations from conferences you couldn’t attend & in the process learn so much. But then it became more popular (with I’m guessing increasing running costs). With this popularity came advertising. Then there was more advertising along with reduced functionality, or to be more correct, functionality only available if you became a paid member. Then the final straw, it got bought by LinkedIn. Now LinkedIn is one of those services that never provided any benefit to me at all. Endless emails, worthless recommendations and it just became creepier and creepier. So, despite having an account with them that was essentially inactive – you often sign up for things as it’s a good idea at the time & it might prove to be useful – I decided to remove myself from their system (it’s a story in it’s own trying to rid yourself from their system). Doing this also meant deleting my SlideShare account as a: I wanted out from that company & b: the SlideShare of now, was no longer the same SlideShare I signed up for.

It wasn’t a decision taken lightly. I had embedded presentations in this site. Others had embedded my presentations in their sites or linked to my presentations.

Personally, the presentations I’ve given have been some of the most rewarding activities I’ve undertaken in my professional life. Taking this step broke everything. Links were gone, embeds were gone. There was no record anymore of my work. I’m so sorry.

The web was once a great place to create a distributed identity for yourself. You could host images one one site, videos on another, presentations on yet another site and easily bring all of this back into one place. This is still the case, but it’s come at a cost. The cost is advertising. Your content gets top and tailed, or overlayed with advertising that you have no control over. Privacy policies get changed. I’m over all that. It’s my content. I want to control how it’s delivered and presented to the world.

It’s time to own my content again.

So I have some work to do on my site to migrate all my presentations back to their rightful home – here. Even though many of them now are dated, they are still me, they have helped to define my career. I need to do the right thing by them.

I could take the easy option and just put a PDF of them up, but I know there’s better ways of doing it than that. I know what has to be done. I just need to do it. I’ve already started the process by changing the hosting of a few explanation videos I’ve used in posts to be served from this site rather than embedding them from YouTube.

Stay tuned as I rebuild my digital presence right here.

It’s not an advanced search, it’s an advanced interface

Friday, November 6th, 2015

The ability to converse with computers has for a long time been the realm of science fiction – 2001 and HAL 9000 (or if you were a child of the 80’s maybe the Knight Industries Two Thousand). In the past few years we’ve started to see speech interaction become much more common thanks to services like Siri on iOS devicesDictation in OS X and Cortana on Windows. When you think that we’re increasingly starting to access the web via mobile devices, all of which have a microphone built in to them, it makes sense that speech should be a natural form of input compared to typing on a tiny keyboard.

Recently browsers have started to allow developers access to the Web Speech API. The Web Speech API is a JavaScript API that allows developers to incorporate speech recognition or provide text to speech features within their web based applications. At the moment, as it’s still relatively experimental so doesn’t have thorough browser support. Currently, Chrome is the only major browser to support this feature.

You may have noticed that when you visit the Google homepage in Chrome there is a microphone in the corner of the search box.

Comparison between Google homepage in Firefox and Chrome

Comparison between Google’s homepage in Firefox and Chrome

Clicking on the microphone allows you to dictate a query to Google, rather than typing in your query. The speech recognition doesn’t happen within the browser. The API takes your speech, sends it to Google for analysis & returns a string of text as to what it has interpreted the speech as. It’s a similar process if you’ve enabled Dictation in OSX, your speech is sent to Apple for analysis (there are options for enabling offline recognition). The need for a third party speech recognition service is one reason why browser support is limited. At the moment there isn’t a universal recognition system that these browsers can point to, the recognition is tied into where ever the browser maker decides the recognition is going to take place.

Faking natural language processing

This is the bit where all those researching natural language processing start to roll their eyes & start laughing at me. This section is called faking it and is a very simplistic approach to natural language processing. It’s by no means perfect, but it will demonstrate what you can quite easily do with hardly any effort.

For this demonstration I’ve built an application to demonstrate the Web Speech API and to see if we can make searching a collection an easier process by using speech and ‘faking’ some natural language processing. I’ve built the application to query Trove using it’s API.

How does this work?

Trove provides a perfect example for searching as it’s all essentially fielded queries. To start with there are all the various zones that can be searched upon (pictures, books, maps etc). In addition to the zones, we can target searches to be limited to fielded data like titles, dates and creators. Compared to a broad search interface like Google’s, our search interfaces are dealing with a limited number of combinations limited to the use of these fields.

In our library catalogues or museum collection searches we typically try to make sense of the multitude of fields by grouping them in a related manner.

Trove advanced search screenshot

This is clear, but still quite complex. I’ve previously presented about the difficulties users can encounter trying to successfully navigate these interfaces. Take a sample search “Paintings by Sidney Nolan between 1946 and 1948”. To successfully submit the search requires the user to select a zone to search (Pictures), entering queries into 3 different sections of the form (once in the creator text field and 2 date entries), and interacting with 2 drop down menus (the creator and selecting a format). It’s not a simple task, however, the search term itself isn’t exactly complex. What if we could programatically break down that query automatically into the components that make up these fields?

  • Paintings
  • Sidney Nolan
  • 1946-1948

This can be achieved by passing the query through a set of filters to match patterns that exist in a term. These filters are known as regular expressions. Let’s take a look at just the way we express dates in English & look at how we can detect these patterns and convert them into a query that the Trove API will understand.

English phrase Regular Expression Trove API speak
in 1993 (in|from) ([1-2][0-9]{3}) date:[1993 TO 1993]
from 1933 (in|from) ([1-2][0-9]{3}) date:[1993 TO 1993]
before 1962 (before|pre) ([1-2][0-9]{3}) date:[* TO 1962]
pre 1918 (before|pre) ([1-2][0-9]{3}) date:[* TO 1918]
after 2001 (after|post) ([1-2][0-9]{3}) date:[2001 TO *]
post 1945 (after|post) ([1-2][0-9]{3}) date:[1945 TO *]
in the 1960s (in|from) the ([1-2][0-9]{2}[0][\’]?[s]) decade:196
from the 1960’s (in|from) the ([1-2][0-9]{2}[0][\’]?[s]) decade:196
between 1932 and 1956 (between|from) ([1-2][0-9]{3}) (and|to) ([1-2][0-9]{3}) date:[1932 TO 1956]
from 1939 to 1945 (between|from) ([1-2][0-9]{3}) (and|to) ([1-2][0-9]{3}) date:[1939 TO 1945]

In regular expressions the pipe character “|” indicates OR. A year can (roughly) be expressed by the first character being a 1 or 2 followed by 3 characters that are between 0 and 9 e.g.: ([1-2][0-9]{3}). By testing for these patterns matching a query, it’s relatively easy to extract date information from our query.

Likewise we can look at the start of the query to look for what type of search a user is looking for: books by, pictures of, sound recordings of, photos by, photos taken by, paintings by, maps made by, braille copy of etc etc. By matching these we can determine the “major zone” a query might be taking place in e.g.: book, picture, map etc and possibly a format that is a subset of these major zones e.g.: art work, sound, audiobook, braille, etc.

In addition to the type of zone to search on, it’s also possible to break down the type of search. The terms of “about” or “by” can indicate that a search for “photos of” is a subject search while a search for “photos by” is searching for a creator.

Let’s take a brief look at some common terms that people might use when asking a question and look at how we can analyse these sentences to turn them into a query that a service like Trove would understand. We would typically ask a question along the lines of:

  • Pictures of Sydney Harbour Bridge
  • Pictures of Sydney Harbour Bridge before 1930
  • Pictures of the Sydney Harbour Bridge between 1985 and 1992
  • Photos of the Sydney Harbour Bridge from the 1920’s
  • Books by J.K. Rowling
  • Audio books of Harry Potter
  • Braille version of Harry Potter and the philosopher’s stone
  • Pictures of Canberra from 1926
  • Pictures taken in 1926 (or Pictures from 1926)
  • Maps of Sydney before 1850
  • Recordings of Hubert Opperman
  • ISBN equals 0747545723

2 methods of input for the price of 1

It’s not just about speech. Remember that not every browser supports speech input. Luckily, since the result returned from the speech recognition service is a string of text, this is identical to what could be typed into a search box. This simplistic natural language processing also works when you type a phrase in English – making this process available to any user using any browser.

Not seeking perfection

This really is a demonstration and only uses a selected portion of possible combinations to query – mostly format and date based. There’s obvious issues of false positives. If you were looking for a book with a title of “Photographs of Sydney”, you would get photographs rather than the book. However, we could display other results and list books with this term as a title in facets. There are ways around this.

Maybe with a bit more refining and experimenting, these techniques could greatly assist in providing a simpler interface for interacting with our collections. Have a play with my Ask Trove application and think about how this concept might be able to be incorporated into other applications.