Jerome began in the summer of 2010, as an informal ‘un-project’, with the aim of radically integrating data available to the University of Lincoln’s library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. Jerome addresses many of the challenges highlighted in the Resource Discovery Taskforce report , including the need to develop scale at the data and user levels, the use of third-party data and services and a better understanding of ‘user journeys’.
Here, we propose to formalise Jerome as a project, consolidating the lessons we have learned over the last few months by developing a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an ‘intelligent’, personalised interface for library users.
The approximate size of the proposed metadata collection will be the entirety of our ~250,000 bibliographic record library catalogue,With the possibility of different levels of machine access/permitted re-use, to allow for records in which a third party has copyright. along with constantly expanding data about our ~60,000 available e-journals and their contents augmented by the Journal TOCs API, and ~3,000 additional records from our EPrints repository. All records will be available through a dedicated browser-based search interface, and (via APIs) as JSON, XML, and in reference management formats (.ris). We will expose underlying Linked Data where possible and appropriate.
Jerome will particularly focus on the ways in which our own data can interact (through our own and third-party APIs) with external datasets: for instance, using ISSN data derived from Lincoln’s catalogue and e-journals knowledgebase software, to create scalable e-journal search and discovery services using the JournalTOCs APIs (and building on the work of the WattJournals project at Heriot-Watt University).
Specifically, we will deliver:
- Openly licensed, bibliographic data, including books, repository records, e-journal table of contents
- Attractive, documented, supported APIs for all data, with timeline of data refresh cycles
- A sustainable public-facing search portal service, integrating third-party data via appropriate external bibliographic APIs
- A semantic sitemap of aggregated data
- Analytics on data use
- Documented technical user case studies (i.e. a ‘cookbook’)
- Documented ‘user journeys’ case studies
- Documented use of infrastructure: MongoDB for Marc data store, Sphinx horizontal scaling search
- Documented machine learning/personalisation engine
- Contributions to community events, workshops, training as and when requested.
The University of Lincoln recently led the HEFCE-funded Learning Landscapes project which looked closely at the design and use of space for research, teaching and learning across several universities, including our own. An outcome of that project was the design of a tool to help investigate the three fundamental qualities of good spatial design. These are efficiency, effectiveness and expression. The project clearly recognised the role of technology in creating an ‘edgeless university’ the use of the web as integral to the Learning Landscape of the university and beyond. Just as the physical space can benefit from a re-evaluation of its efficiency, effectiveness and expression, so can the virtual space and in doing so, the openness, flexibility and contribution of our online information services should be valued in a similar way to our physical assets.
In assessing our own Learning Landscape, we recognise that experimentation and innovation in our library services are essential to remaining relevant and useful to our researchers, teachers and learners.
The Jerome un-project began in June 2010 using a modest amount of otherwise unallocated ‘windfall’ funding, building on what we had learned about our own data infrastructure through Lincoln’s successful contribution to the JISC-funded MOSAIC project with the University of Huddersfield. Jerome has been an informal collaboration between Library and ICT staff with a ‘just do it’ approach. We wanted to surprise each other with what could be achieved by a few people dedicated to reimagining library services. Since June, we have converted MARC records from our SirsiDynix catalogue into JSON representations stored in MongoDB,1 creating a meta-catalogue which we can query at remarkably fast speeds, as well as perform fast full-text searches using a Sphinx search server.2
Because of the simplicity of storing JSON, we have been able to develop fast APIs that avoid unnecessarily complex queries. However, even complex queries performed through the API are being returned in a matter of tens of milliseconds. Sphinx scales horizontally by creating resilient distributed indexes. This means that we have been able to create a non-homogeneous ‘universal search’ service including book and journal records, EPrints records (via OAI-PMH), university WordPress blog posts, and more.3 In particular, our work on the JISCPress project has furthered our thinking around the use of WordPress to provide Open and Linked Data, including that of bibliographic records.4 and to apply this attention-intelligence data to a personalisation ‘engine’ through machine learning techniques. So far, this has involved the development of a geo-location API for location data,an OAuth API for application level authentication5 and will draw heavily from our work and experience on Total ReCal.
All of these innovations have been driven by a vision of bringing to the university’s library services, what we now know and love about the web, its open standards and technologies and our growing understanding of user behaviour. With this project proposal, we feel strongly that we can contribute to the programme’s aims of developing more “flexible, efficient and effective ways to support resource discovery and access to resources for research and learning” through the sustainable provision of open bibliographic data, the provision of supported and convenient APIs which are attractive to developers, and lay the groundwork for a relevant and personalised suite of library services for staff and students at the University of Lincoln.
- MongoDB, a NO-SQL database used by a number of organisations such as the New York Times, bit.ly and Foursquare [↩]
- Sphinx is used to power sites such as Craigslist. [↩]
- A demonstration of our work on Jerome was given at the Chips and Mash event at Huddersfield http://jerome.blogs.lincoln.ac.uk/2010/07/30/the-slides/ [↩]
- On WordPress and Open Data, see http://jiscpress.blogs.lincoln.ac.uk/2009/11/18/open-data-what-have-we-got/ In terms of the use of WordPress for bibliographic data, we prepared a funding application to Talis, which made it through to the second round and was highly praised, but was deemed to ambitious for their fund: http://joss.blogs.lincoln.ac.uk/2010/01/26/opacpress-our-talis-incubator-proposal/
In addition to the aggregation of our own data, we have also been looking at the integration and use of third-party services such as Google Books, LibraryThing, MOSAIC, etc. Our choice of a non-relational database has also made cross-service integration much easier and we have discovered the provision to store (and deliver) a large amount of data associated with items, without the need to tailor that data to Jerome first.
Also over the summer period, the Library tagged its entire stock of books with RFID chips. The Jerome un-project saw this as an opportunity to link physical items in our library collections with their online representations, and began mapping the physical library spaces in 3D to show how shelf locations could be visualised for library users. The provision of a more intelligent, personalised experience for users of our library services has been an interest throughout our work on Jerome and we have begun work on ways to determine where the user is, who they are and what they require, ((See also, our discussion of the ‘relevancy engine’ here:
- Since writing this post, the OAuth API has been written and is now in use on selected university services. [↩]