Next Generation Library Catalogues

These are notes from a talk by Eric Lease Morgan from University of Notre Dame at the Libraries Australia Forum 2008.

Eric Lease MorganThe environment is changing, cheap computers that are globally connected have changed the way libraries work and what they are about.

When items are analogue it is important to create surrogates of our items.  Libraries had to create a catalogue to be able to describe it as there was no way to directly access physical holdings.  Now that items are often born digital, it’s not as necessary to create surrogates as it used to be.  Things like full text indexing can supplement a catalogue.  Indexing was ignored by libraries for decades, then Google came along and proved it could be done.  As items are born digital, a person coming to a library and accessing an item in a specific physical space is no longer, it can be accessed from anywhere.  Enormous amounts of information are held on things like USB drives (all of WorldCat can be stored on an ipod) and it’s cheaper than in the past.

Librarianship consists of 4 processes:

  1. Collection: done by bibliographers and can be supplemented through the use of databases
  2. Preservation: done by archivists, most challenging in the current environment
  3. Organisation: done by cataloguers supplemented by databases and XML
  4. Re-distribution: done by reference librarians

These processes won’t be outdated due to technology, it’ll just change the way they are done.  If you think about books, you don’t have much of a future, but if you think about what is in books, then you have a future.

There are two services the user can interact with:

  1. query against the index
  2. query against the content

In the past, users could only do queries against an index.  Now users can do queries directly against the content, for example carrying out a full text search on a book or a newspaper. The real future is in the growth of services against the content. This means users can partake in things like:

  • Annotation
  • Create tag clouds
  • Taking quotations and citing it
  • save it to ‘my favourites’
  • working out how often words are included, or what are unique words across a collection

Libraries are always a part of a larger hosting communities.  Learn how to take advantage of this fact and put searches against the catalogue into the users context.  This used to be done face to face, you built a relationship with the librarian, why is it so impersonal on the web?  You can replicate, but not replace this with a computer.

You need to know your user. For example, if a user is searching for nuclear physics, the results you should return are different if the user is a physicist or a high school student.

Database are great for organising and maintaining content, but they are lousy when it comes to search.  You have to know the structure of the database in order to do a search.  Indexes are the opposite.  An index is a list of words with pointers to where the word can be found.  You don’t need to know the structure of the database and you can do things like relevance ranking.

“Next-generation” catalogues such as vu-find, evergreen, primo, aquabrowser….. they are all very, very similar with the exception of evergreen which is an intergrated library system.  Discovery systems deal with MARC records, EAD, XML – these systems normalise them to create an index, most of them using Lucene as the indexer.  Open source with a layer on top.


The library catalogue isn’t really YOUR catalogue.  Include everything related to your audience in an index, not just stuff that you own.  Make sure everything in there is accessible via Google, Yahoo, MSN.  Put as much open access content in there as possible. Gather it and include it in your index, you can’t rely on others to do it and it’s easier to search and do things with the data when you have control of it. Apply a library eye to incoming queries (eg: munge the query into a phrase search to enrich the query).  We need to do less library standards and more W3C standards. Repurpose the system by exploiting SOA and RESTful computing techniques.


How we do things are changing, requiring retraining and a shift in attitudes to investigate ways to exploit the current environment.