One of the main things we’re trying to do with Jerome is open up all of our Library data for you to do things with. By this we mean that we want to expose as much as possible of what we know using a properly open licence such as CC0. Unfortunately, we’re hitting a few snags with this. None are insurmountable but we’re having to tread carefully, particularly with issues of licensing.
The licensing of data is an interesting one, since we run into a whole bunch of questions around who actually owns the information in our catalogue. Since it’s all factual information (and you can’t copyright a fact) then surely it’s a free for all – except that EU law introduces a curve ball in the form of database right. Broadly speaking this provides specific protection for collections of records, but not the records themselves.
Jerome at this point begins to play merry hell with licensing since (as far as catalogue data) is concerned we never touch the database, we only ask for catalogue information on a specific record and then scrape it back into our own collection. Since we’re only dealing with facts about a work (and not the work itself, which clearly is copyrighted1) then theoretically we may (or may not) be in the clear to do what we want. Personally, I’m waiting for some definitive legal advice before going any further on this one.
So, assuming for a moment that all legal wrangling over licensing has been sorted out we still hit a problem of how we subsequently expose that data in a useful way. The beautiful thing about standards is that there are so many to choose from, and we have a list of ways to expose data including MARC, XML, JSON, CSV, BibTeX, RIS, RDF, COinS and more. These are then subsequently mixed up even further using various standards for metadata, although we seem to have decided that Dublin Core is the way to go.
Fortunately since we’ve built our own data abstraction layer which (loosely) follows Dublin Core this is going to be slightly easier, but still ultimately a pain. We’re also making sure that we talk to other people engaged in open library data so that we can agree on at least a few basic things, ensuring data can be effortlessly collated from multiple sources.
We’ll have data with you as soon as possible, just let us work out the kinks of licensing and formatting first.
- Unless it’s public domain, in which case it isn’t [↩]
