This is only the initial, bare-bones JSON-only service. A complete (and fully-documented) API will be released in stages over the next month, providing data in a range of output formats. We’re keeping all API and open institutional data documentation in the one place, on our open data site.
It was a particularly useful event, especially so for being packed into 2½ hours (and worth learning to drive an automatic in order to get there!), with a presentation from Loughborough about their project to select a next-generation OPAC system; group discussions around some of the factors involved in launching such services; and our own contribution, which led to some interesting conversations about the benefits and risks of experimentation in libraries.
One of our key aims for Jerome is for the whole thing to be fast. Not “the average search should complete in under a second” fast, but “your application should be fine to hit us with 50 queries a second” fast.
This requirement was one of the key factors in our decision to use MongoDB as our backend database, and provide search using Sphinx. We’ll have another blog post fairly soon with more detail on how we’re using Mongo and Sphinx to store, search and retrieve data but for now I’d like to share some preliminary numbers on how close we are to our goal of speed.
First of all, getting data in. This is a pain in the backside due to the MARC-21 specification being so complex and needing to perform several repetitive checks on data to make sure we’re importing it right. However, on the import side of things we’re in the region of importing 150 MARC records a second, including parsing, filtering, mapping fields and finally getting the data into the database. This is done using the File_MARC PEAR library to manage the actual parsing of the MARC data into a set of arrays, then some custom PHP to extract information like title, author, publisher etc. into a more readily understood format. This information extraction isn’t or complete yet so it’s likely that there’ll be a bit of a slowdown as we add more translation rules, but equally it’s not optimised to improve speed.
Today’s Talis Linked Data and Libraries open day has motivated me to make a list of some of the external data tools, web services and APIs that could well end up being sucked into Jerome’s vortex of general awesomeness.
I was inspired (possibly through drinking too much SPARQL-themed coffee) by the thought that 2010 is effectively ‘year 1′ for library-themed Linked Data. (But I promise I’ll try and keep the ‘Lincoln’/’linking’ puns to a minimum after this post…)
So, which of these will make their way into the Jerome toolkit? (I’ll say now, before I get in trouble, that they’re not all purely Linked Data!) …compiled in part from theseotherlists, and by discussions/examples at the Talis event: