Posts Tagged ‘open data’

What did it cost and who benefits?

Posted on July 27th, 2011 by Paul Stainthorp

This is going to be one of the hardest project blog posts to write…

The costs of getting Jerome to this stage are relatively easy to work out. Under the Infrastructure for Resource Discovery programme, JISC awarded us the sum of £36,585 which (institutional overheads aside) we used to pay for the following:

  • Developer staff time: 825 hours over six months.
  • Library and project staff time: 250 hours over six months.
  • The cost of travel to a number of programme events and relevant conferences at which we presented Jerome, including this one, this one, this one, this one and this one.

As all the other aspects of Jerome—hardware, software etc.—either already existed or were free to use, that figure represents the total cost of getting Jerome to its current state.

The benefits (see also section 2.4 of the original bid) of Jerome are less easily quantified financially, but we ought to consider these operational benefits:

1. The potential for using Jerome as a ‘production’ resource discovery system by the University of Lincoln. As such it could replace our current OPAC web catalogue as the Library’s primary public tool of discovery. The Library ought also to consider Jerome as a viable alternative to the purchase of a commercial, hosted next-generation resource discovery service (which it is currently reviewing), with the potential for replacing the investment it would make in such a system with investment in developer time to maintain and extend Jerome. In addition, the Common Web Design (on which the Jerome search portal is based) is inherently mobile-friendly.

2. Related: even if the Jerome search portal is not adopted in toto, there’s real potential for using Jerome’s APIs and code (open sourced) to enhance our existing user interfaces (catalogues, student portals, etc.) by ‘hacking in’ additional useful data and services via Jerome (similar to the Talis Juice service). This could lead to cost savings: a modern OPAC would not have to be developed in isolation or tools bought in. And these enhancements are as available to other institutions and libraries as much as to Lincoln.

3. The use of Jerome as an operational tool for checking and sanitising bibliographic data. Jerome can already be used to generate lists of ‘bad’ data (e.g. invalid ISBNs in MARC records); this intelligence could be fed back into the Library to make the work of cataloguers, e-resources admin staff, etc., easier and faster (efficiency savings) and again to improve the user experience.

4. Benefits of Open Data: in releasing our bibliographic collections openly Jerome is adding to the UK’s academic resource discovery ‘ecosystem‘, with benefits to scholarly activity both in Lincoln and elsewhere. We are already working with the COMET team at Cambridge University Library on a cross-Fens spin-off miniproject(!) to share data, code, and best practices around handling Open Data. Related to this are the ‘fuzzier’ benefits of associating the University of Lincoln’s name with innovation in technology for education (which is a stated aim in the University’s draft institutional strategy).

5. Finally, there is the potential for the university to use Jerome as a platform for future development: Jerome already sits in a ‘suite’ of interconnecting innovative institutional web services (excuse the unintentional alliteration!) which include the Common Web Design presentation framework, Total ReCal space/time data, lncn.eu URL shortener and link proxy, a university-wide open data platform, and the Nucleus data storage layer. Just as each of these (notionally separate) services has facilitated the development of all the others, so it’s likely that Jerome will itself act as a catalyst for further innovation.

Following in Jerome’s footsteps

Posted on July 27th, 2011 by Paul Stainthorp

As the formal end of the Jerome project looms, this blog post is aimed at other people who might want to take a similar approach to breaking data out of their library systems.

From the start, Jerome took an approach that can be summed up as:

  • Use the tools and abilities that you have to hand. Go for low-hanging fruit and demonstrate value early on.
  • Follow the path of least resistance. If it works, it’s fine. Route around problems rather than fighting against them.
  • Consider the benefits of a ‘minimally invasive’ approach to library systems. Use the web to passively gather copies of your data without having to make changes to your exisiting root systems (such as LMSes).

What tools would people need to take this approach?

1. Technical ability. Jerome would have got nowhere without the coding abilities of University of Lincoln developers Nick Jackson and Alex Bilbie, and the approach they take to development (see below). It’s fair to say that their involvement has been something of a culture-shift for the Library: they have brought a fresh approach to dealing with our systems and data.

2. An agile, rapid, iterative project methodology and a suite of (often free; always web-based) software tools to support that way of working. This probably can’t be overstated: a ‘traditional’ project management methodology just wouldn’t have worked for Jerome.

3. An understanding of where data resides in your current systems. We’ve had to become uncomfortably au fait with the structure of SirsiDynix‘s internal data tables, MARCXML, OAI-PMH, RIS and all sorts of other unpleasantness. Also an awareness of the rich ecosystem of third-party library/bibliographic/other useful data that exists on the web.

4. Related: a willingness to try different approaches to getting hold of this data (the absolutely-anything-goes, mashup-heavy approach): APIs: great. Screen-scraping: yeah, if we must. SQL querying, .csv dumps, proprietary admin interfaces: all fine. Don’t be precious about finding a way in. By far the most important thing is to provide the open data service in the first place. Things can always be tidied up, rationalised, at a later date.

5. A box to run things on. This doesn’t have to be a large institutional server: we’ve successfully run Jerome on a Mac Mini.

6. Finally, the use of blogs—such as this one—and social media to engage a (self-selecting, admittedly) community of potential users and fellow-travellers.

Priorities:

  • Use tools such as Pivotal Tracker and mind maps to capture requirements and turn them into a plan for development.
  • Meet regularly to review the plan and push the ideas on to the next stage.
  • Decide what’s ‘out of scope’ early on so you can concentrate on maximum value.
  • Realise that there’s value in releasing even part of your data—for example, a list of ISSNs—which you can exploit immediately without having to worry about issues (e.g. third-party copyright in records) that might affect your complete dataset.
  • Blog about what you’re doing. Little and often is the way to go.

What to avoid:

While it’s important to be aware of the bigger picture, don’t get too distracted by the way things are being done elsewhere.

A lot of the open data movement seems to be closely tied up with providing access to large quantities of Linked Data (the dreaded RDF triple store!), and initially I was worried that because we were not taking that approach we were somehow out of step. (In fact, I think that our not concentrating on Linked Data has allowed Jerome to explore the open data landscape in a different and valuable way, i.e. the provision of open bibliographic data services via API – see Paul Walk’s slides on providing data vs. providing a data service. I know which side my bread’s buttered: Open Data ≠ merely open; Open Data = open and usable.)

Similarly, while a lot of the discussion around third-party intellectual property rights in data has been phrased in terms of negotiating with those third parties to release the data openly (or taking a risk-assessment approach to releasing it anyway!), Jerome took a different approach, which has been first to release openly those (sometimes minimal) bits of data which we know are free from third-party interest, then to use existing open data sources to enhance the minimal set: what we end up with are data that may differ from the original bibliographic record, but which are inherently open. It’s not a better approach, just a different one.

Who you will need to have ‘on side':

  • Your systems librarian, or at least someone who has a copy of the manual to your library catalogue!
  • A cataloguer, to excuse explain the intricacies of MARC.
  • A university librarian / head of service who is willing to take risks and sanction the release of open data.
  • A small-but-thriving community of developers, mashup-enthusiasts and shambrarians (see above), including people on Twitter who can act as a sounding board for ideas.
  • Ideally, your project team should include as many people as possible who do not have a vested interest in the way libraries have historically got things done. Jerome has been valuable in arising from an approach that’s different from the library norm.

#discodev: worldwide software development competition using open library data

Posted on July 5th, 2011 by Paul Stainthorp

Copied verbatim (and under licence!) from the UK Discovery website:

Discovery logo

UK Discovery (http://discovery.ac.uk/) and the Developer Community Supporting Innovation (DevCSI) project based at UKOLN are running a global Developer Competition throughout July 2011 to build open source software applications / tools, using at least one of our 10 open data sources collected from libraries, museums and archives.

…and one of the 10 open data sources is the Jerome API we announced last week!

Enter simply by blogging about your application and emailing the blog post URI to joy.palmer@manchester.ac.uk by the deadline of 2359 (your local time) on Monday 1 August 2011.

Full details of the competition, the data sets and how to enter are at http://discovery.ac.uk/developers/competition/

Follow #discodev on Twitter to see what people are up to.

Is that a Jerome open data API I spy?

Posted on June 28th, 2011 by Paul Stainthorp

Yes. Yes, it is.

http://data.online.lincoln.ac.uk/documentation.html#bib

This is only the initial, bare-bones JSON-only service. A complete (and fully-documented) API will be released in stages over the next month, providing data in a range of output formats. We’re keeping all API and open institutional data documentation in the one place, on our open data site.