Posted on April 21st, 2011 by Nick Jackson
Another week, another iteration down. Here’s the summary for last week:
- JournalTOCs Licensing: This seems to be a CC-BY licence, but we’re just double checking with Heriot-Watt about if this licences the API itself, or the data that comes from it.
- Journal entries in search now sport availability dates in a nice human readable format (eg “From 1982 to now”, “From 1996 to 6 months ago”)
- If you add Jerome to an iOS device home screen it now has a slick new icon.
- Item pages for catalogue items include availability of current stock in most cases. We’re aware of some journals which are catalogued as books where this isn’t the case, and we’re working on it.
- Fixed a bug where our import script was importing empty records for books which didn’t exist. Blamed Horizon Information Portal for returning pages of empty content rather than a HTTP 404.
- Journal search results now point to an individual Jerome item page, rather than directly to our OpenURL resolver. OpenURL now lives in the bright orange “Online” box in the top right of an item page.
In other news, we’ve spent a fair bit of time starting to boost our ‘master plan’ mind map, and have moved a lot of the development points into the iterative model so that we’ll get round to them eventually. At some point next week I’ll be trying to get them into some kind of rough order out of the icebox so we can start to forecast iterations. Don’t forget that you can follow our current progress on our tracker if you’re really interested in the inner workings.
Posted on February 27th, 2011 by Nick Jackson
It’s been a while since our last update, so I’m going to go over the key data sets which we’ll be using to drive the Jerome project. These are the collections of data which we lump together into the unified search index, as well as those which provide the supporting metrics to drive the intelligent generation of results.
The obvious one to include is the contents of our library catalogue, currently being scraped from our Horizon LMS using the HiP system. This contains information on titles available within our own collections, including both physical books and ebooks as well as ‘auxiliary’ collections such as reference and dissertations. This data is supplemented (where available) from sources such as Open Library and LibraryThing to provide as rich an experience as possible.
On top of the catalogue we’re also including the contents of our institutional repository. This is a collection of papers, datasets and other useful bits and pieces of academic importance from the depths of the University. It’s also harvestable through the initially horrific-seeming but actually delightfully sensible OAI-PMH) standard. It’s a little slower than I expected, but it allows us to cleanly extract all the data we want to regarding titles, authors, summaries and access URIs. The OAI-PMH harvester also has the handy side effect of being compatible with archiving software that the Library is proposing acquiring, so we reduce the workload required to add other sources.
Journals are up next, and this is a tricky one since many publishers (being of the old-skool “we don’t understand this ‘Internet’ thing” and “why would you want our data?”) don’t tell us anything remotely useful about their journals or their contents. Fortunately for us, help is at hand from the people at Heriot-Watt in the form of JournalTOCs, a service which provides information on journals and what’s in them based purely on ISSNs. Since we have a gigantic list of the ISSNs of all the journals we can access it’s a fairly simple matter to loop through them and extract all the data we can.
JournalTOCs also – as the name suggests – provides us with tables of contents for some of these journals meaning that we can even provide searching down to the individual journal articles.
So they’re the four big things we’re initially launching Jerome’s integrated search with: catalogue, repository, journals, and journal contents.