Jerome began in the summer of 2010, as an informal ‘un-project’, with the aim of radically integrating data available to the University of Lincoln’s library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. Jerome addresses many of the challenges highlighted in the Resource Discovery Taskforce report , including the need to develop scale at the data and user levels, the use of third-party data and services and a better understanding of ‘user journeys’.
Here, we propose to formalise Jerome as a project, consolidating the lessons we have learned over the last few months by developing a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an ‘intelligent’, personalised interface for library users.
The approximate size of the proposed metadata collection will be the entirety of our ~250,000 bibliographic record library catalogue,With the possibility of different levels of machine access/permitted re-use, to allow for records in which a third party has copyright. along with constantly expanding data about our ~60,000 available e-journals and their contents augmented by the Journal TOCs API, and ~3,000 additional records from our EPrints repository. All records will be available through a dedicated browser-based search interface, and (via APIs) as JSON, XML, and in reference management formats (.ris). We will expose underlying Linked Data where possible and appropriate.
Jerome will particularly focus on the ways in which our own data can interact (through our own and third-party APIs) with external datasets: for instance, using ISSN data derived from Lincoln’s catalogue and e-journals knowledgebase software, to create scalable e-journal search and discovery services using the JournalTOCs APIs (and building on the work of the WattJournals project at Heriot-Watt University).
Specifically, we will deliver:
- Openly licensed, bibliographic data, including books, repository records, e-journal table of contents
- Attractive, documented, supported APIs for all data, with timeline of data refresh cycles
- A sustainable public-facing search portal service, integrating third-party data via appropriate external bibliographic APIs
- A semantic sitemap of aggregated data
- Analytics on data use
- Documented technical user case studies (i.e. a ‘cookbook’)
- Documented ‘user journeys’ case studies
- Documented use of infrastructure: MongoDB for Marc data store, Sphinx horizontal scaling search
- Documented machine learning/personalisation engine
- Contributions to community events, workshops, training as and when requested.
Some background…
The University of Lincoln recently led the HEFCE-funded Learning Landscapes project which looked closely at the design and use of space for research, teaching and learning across several universities, including our own. An outcome of that project was the design of a tool to help investigate the three fundamental qualities of good spatial design. These are efficiency, effectiveness and expression. The project clearly recognised the role of technology in creating an ‘edgeless university’ the use of the web as integral to the Learning Landscape of the university and beyond. Just as the physical space can benefit from a re-evaluation of its efficiency, effectiveness and expression, so can the virtual space and in doing so, the openness, flexibility and contribution of our online information services should be valued in a similar way to our physical assets.
In assessing our own Learning Landscape, we recognise that experimentation and innovation in our library services are essential to remaining relevant and useful to our researchers, teachers and learners.
The Jerome un-project began in June 2010 using a modest amount of otherwise unallocated ‘windfall’ funding, building on what we had learned about our own data infrastructure through Lincoln’s successful contribution to the JISC-funded MOSAIC project with the University of Huddersfield. Jerome has been an informal collaboration between Library and ICT staff with a ‘just do it’ approach. We wanted to surprise each other with what could be achieved by a few people dedicated to reimagining library services. Since June, we have converted MARC records from our SirsiDynix catalogue into JSON representations stored in MongoDB, creating a meta-catalogue which we can query at remarkably fast speeds, as well as perform fast full-text searches using a Sphinx search server.
Because of the simplicity of storing JSON, we have been able to develop fast APIs that avoid unnecessarily complex queries. However, even complex queries performed through the API are being returned in a matter of tens of milliseconds. Sphinx scales horizontally by creating resilient distributed indexes. This means that we have been able to create a non-homogeneous ‘universal search’ service including book and journal records, EPrints records (via OAI-PMH), university WordPress blog posts, and more. In particular, our work on the JISCPress project has furthered our thinking around the use of WordPress to provide Open and Linked Data, including that of bibliographic records. and to apply this attention-intelligence data to a personalisation ‘engine’ through machine learning techniques. So far, this has involved the development of a geo-location API for location data,an OAuth API for application level authentication and will draw heavily from our work and experience on Total ReCal.
All of these innovations have been driven by a vision of bringing to the university’s library services, what we now know and love about the web, its open standards and technologies and our growing understanding of user behaviour. With this project proposal, we feel strongly that we can contribute to the programme’s aims of developing more “flexible, efficient and effective ways to support resource discovery and access to resources for research and learning” through the sustainable provision of open bibliographic data, the provision of supported and convenient APIs which are attractive to developers, and lay the groundwork for a relevant and personalised suite of library services for staff and students at the University of Lincoln.