Posts Tagged ‘HiP’

Iteration Roundup

Posted on April 21st, 2011 by Nick Jackson

Another week, another iteration down. Here’s the summary for last week:

  • JournalTOCs Licensing: This seems to be a CC-BY licence, but we’re just double checking with Heriot-Watt about if this licences the API itself, or the data that comes from it.
  • Journal entries in search now sport availability dates in a nice human readable format (eg “From 1982 to now”, “From 1996 to 6 months ago”)
  • If you add Jerome to an iOS device home screen it now has a slick new icon.
  • Item pages for catalogue items include availability of current stock in most cases. We’re aware of some journals which are catalogued as books where this isn’t the case, and we’re working on it.
  • Fixed a bug where our import script was importing empty records for books which didn’t exist. Blamed Horizon Information Portal for returning pages of empty content rather than a HTTP 404.
  • Journal search results now point to an individual Jerome item page, rather than directly to our OpenURL resolver. OpenURL now lives in the bright orange “Online” box in the top right of an item page.

In other news, we’ve spent a fair bit of time starting to boost our ‘master plan’ mind map, and have moved a lot of the development points into the iterative model so that we’ll get round to them eventually. At some point next week I’ll be trying to get them into some kind of rough order out of the icebox so we can start to forecast iterations. Don’t forget that you can follow our current progress on our tracker if you’re really interested in the inner workings.

Progress, progress…

Posted on April 11th, 2011 by Nick Jackson

It’s an end-of-iteration time, and that means a round-up of what’s been going on in the Jerome world. I’m going to kick off by announcing the exciting news that we’ve standardised on Pivotal Tracker as our agile workflow manager. This means that we can all see exactly what’s planned, what needs tweaking and what’s coming next as well as getting immediate updates on the state of the current iteration. Since this is a JISC project we’ve made the whole thing public so you can see how we’re getting on.

So what’s been happening recently? Here’s a quick breakdown.

  • Catalogue import now uses the full MARC record instead of a stripped-down ‘friendly’ version, giving higher quality data and metadata.
  • Individual items in the catalogue and journals collections now have their own information pages. The same for Repository items is coming soon.
  • Item pages sport COinS metadata for ease of referencing and OpenURL lookups. Give it a whirl with any COinS-compatible browser plugin, like Zotero in Firefox.
  • Item pages all have a huge set of social media baked right in, allowing easy sharing and bookmarking.
  • We’re now aware of where an e-book is available, and highlight it accordingly in search and on the item page.
  • Cover images are now available for both books and a limited (Elsevier) set of journals.
  • Catalogue item pages have links to Google Books previews where they are available
  • We understand different media types (we’re still adding some more), which now highlight things like videos in search results. Soon they’ll be adjustable in search.
  • Tweaked default search weightings to provide slightly more accurate default results.
  • Pulling data from OpenLibrary for items with valid ISBNs provides a richer experience in the “Other Resources” section.
  • Book cover images are now coming from OpenLibrary, giving a higher quality and generally wider range.
  • Search data sets are now much cleaner in terms of character encoding and special character escaping, giving much richer international/foreign character support.

What’s coming next? We’re meeting soon to decide the key points for the next iteration but in the meantime I can reveal that we’ll be busting out our list-fu (reading lists, citation lists, pick lists and more), our first user-aware tools (custom search weighting sets and history!), journal contents, richer subject data, browsing by subject, similar books and more.

Stay tuned!

We Love All Our Data

Posted on February 27th, 2011 by Nick Jackson

It’s been a while since our last update, so I’m going to go over the key data sets which we’ll be using to drive the Jerome project. These are the collections of data which we lump together into the unified search index, as well as those which provide the supporting metrics to drive the intelligent generation of results.

The obvious one to include is the contents of our library catalogue, currently being scraped from our Horizon LMS using the HiP1 system. This contains information on titles available within our own collections, including both physical books and ebooks as well as ‘auxiliary’ collections such as reference and dissertations. This data is supplemented (where available) from sources such as Open Library and LibraryThing to provide as rich an experience as possible.

On top of the catalogue we’re also including the contents of our institutional repository. This is a collection of papers, datasets and other useful bits and pieces of academic importance from the depths of the University. It’s also harvestable through the initially horrific-seeming but actually delightfully sensible OAI-PMH2) standard. It’s a little slower than I expected, but it allows us to cleanly extract all the data we want to regarding titles, authors, summaries and access URIs. The OAI-PMH harvester also has the handy side effect of being compatible with archiving software that the Library is proposing acquiring, so we reduce the workload required to add other sources.

Journals are up next, and this is a tricky one since many publishers (being of the old-skool “we don’t understand this ‘Internet’ thing” and “why would you want our data?”) don’t tell us anything remotely useful about their journals or their contents. Fortunately for us, help is at hand from the people at Heriot-Watt in the form of JournalTOCs, a service which provides information on journals and what’s in them based purely on ISSNs3. Since we have a gigantic list of the ISSNs of all the journals we can access it’s a fairly simple matter to loop through them and extract all the data we can.

JournalTOCs also – as the name suggests – provides us with tables of contents for some of these journals meaning that we can even provide searching down to the individual journal articles.

So they’re the four big things we’re initially launching Jerome’s integrated search with: catalogue, repository, journals, and journal contents.

  1. Horizon Information Portal []
  2. Open Archives Initiative Protocol for Metadata Harvesting – A standard way of getting information about the contents of an archive collection (although not the contents themselves []
  3. International Standard Serial Number – like ISBNs, but for serial publications []