Posts Tagged ‘#rdtf’

In the background at Discovery event

Posted on May 26th, 2011 by Paul Stainthorp

Notice: Undefined index: ga_export_settings in /var/www/html/wp-content/plugins/google-analytics-async/google-analytics-async.php on line 403

A few of the Jerome project team are at the JISC/RLUK event in London: ‘Discovery – building a UK metadata ecology‘. Our slides are running on a screen in the foyer; I’ll be hanging around to talk about them.

How commercial next-generation library discovery tools have *nearly* got it right

Posted on May 17th, 2011 by Paul Stainthorp

Notice: Undefined index: ga_export_settings in /var/www/html/wp-content/plugins/google-analytics-async/google-analytics-async.php on line 403

In Huddersfield (again – I’m barely away from the place!), yesterday, at a CILIP UC&R (University, College and Research Group) Yorkshire & Humberside [catchy name] training event on ‘Discovering Discovery Tools‘. Librarians from four different UK universities gave practical, pros-and-cons descriptions of how they implemented and are now running four different commercial next-gen resource-discovery tools:

Five (count ‘em!) people from Lincoln were in the audience. I was wearing two hats: one for project Jerome for thinking about design concepts in resource discovery tools; the other for my day job – Lincoln is in the middle of a strategic review of Library ICT systems, which may well end up recommending that we buy one of these products.

It was all good stuff. First off, libraries need to hear the honest, warts and all counterpoint to the glowing terms in which each discovery product is described by its vendor. Secondly, it’s useful to subject all four* resource discovery platforms to the same amount of daylight, and see where the common problems lie, as well as where one tool outperforms another. Thirdly—and even though there’s a lot of resource discovery hyperbole to be heard—this is still a big shift for academic libraries, and I think we should discuss implications that are wider than the costs/benefits for an individual institution.

(*Yes, I know there are a few other tools. But they weren’t in the room yesterday.)

Lockside
What’s stopping us? (Canal lock gate at the University of Huddersfield.)

Things that jumped out at me:

Commercial resource discovery has reached a level of maturity that was absent a couple of years ago. That’s not to say that all next-gen resource discovery tools are perfect (because they aren’t), or that there aren’t any problems (because there are; see below), but academic libraries do now have a genuine choice between several different, viable commercial products.

Here’s a heresy: the differences between these four products are not that significant. I think that anyone who went away from yesterday’s event thinking that out of the four discovery tools on display there are some ‘good’ and some ‘bad’ …is probably wrong. It’s not really about the product, it’s about the willingness of the vendor to overcome problems, and about their attitude to their customers. Do you buy a slightly-less slick product, but from a company you feel you can have a more productive relationship with?

In fact, most of the real problems with resource discovery seem to be common to all four of the products on show yesterday. De-duping via FRBR reckons to be a bit of an Achilles’ heel. (A shame. FRBRisation is one of those things you either need to get right, or not do at all. A half-arsed attempt is worse than not bothering.)

Also broken: known-item search. This ought to be trivial to fix, and it needs to be sorted now now now.  I find it particularly sinister that some commercial resource-discovery tools rank their search results according to secret, proprietary algorithms that can’t be inspected or challenged by their users, let alone altered/improved. This is a problem. What’s the point of a library that can’t justify how its resource discovery system actually works? Are we just here to sign the cheques?

Libraries still have a tendency to overcomplicate things for their users. Sometimes they do this because they have no choice (perhaps their shiny new discovery tool doesn’t quite work they way it should); but often they seem just too ready to accept a situation where users are inconvenienced sooner than address an underlying problem. Lincoln included in this sweeping generalisation.

There’s no point pretending that a library can make two independent decisions to purchase [a] a next-gen resource discovery platform, and [b] a journals knowledgebase/link resolver. The two things are all tied up together. To pick a random example: you want Summon, you’d better want 360.

Why can’t we just buy access to a search index? If I want to pay to provide my users with the benefits of a lovely big central index of content, why do I have to buy into your discovery algorithm and web front-end as well? (Whither JISC collections?)

Related, and finally – we really shouldn’t have to replace our search and discovery interfaces every time we want/need to use a different content provider, and we shouldn’t be placed in the situation of having to make collection/subscription decisions in order to ‘feed’ our discovery tool. It may be temptingly easy, cost aside, to pick up and put down different next-gen discovery products (“…it’s just a subscription!”) but there’s too much at stake for our users.

Notes from my ‘personal pitch’ (#rdtf in Manchester)

Posted on April 20th, 2011 by Paul Stainthorp

At the JISC/RLUK Opening Data – Opening Doors event in Manchester on Monday I was asked to deliver a five-minute ‘personal pitch’ relating to why the Open Data approach is important/relevant to people/institutions/communities, based around the philosophy driving work at Lincoln.

I didn’t use slides, but here is a verbatim transcription of my handwritten notes (original on Google Docs):

  1. Lincoln has mixture of internal + JISC-funded projects including Jerome, needs two pages of flipchart paper to list all projects —> leading to a project ‘ecology’.
  2. We’re developing platforms for access to space/time (location, room bookings, calendaring), asset, bibliographic, activity, user, course, research data.
  3. It’s less about open data per se (though we are opening up our data!) – more about building openly-accessible platforms for manipulating that data.
  4. ‘Nucleus’ – one platform for services on all opened institutional data. Documented APIs. Inherently rights-based.
  5. ‘Eating our own dog food’. New institutional apps are built on the Nucleus (rather than by exporting and copying data between back-office systems); internal SOA – ‘hearts and minds’ to be won in uni data teams to this approach, but ICT are committed.
  6. Easier migration. Flexible. Integration with third-party services on the same basis.
  7. Concept of Student as Producer – students as active participants in teaching and learning, research, AND in institutional service development & delivery. Conscious rejection of student as passive consumer.
  8. Students building some of the first applications of Lincoln’s open data services – we didn’t ask them to! – stuff we’d never have thought of or not had time to do.
  9. Related: the way we develop open data platforms and services in the first place. Rapid innovation. Joss Winn has approval to establish a new free-floating technology & pedagogy group; will have responsibility to develop + embed new systems.
  10. Benefits – new tools; new methods of working. Quick responses to changes in HE (essential agility!). Partnerships. Active students.
  11. Challenges – licensing (complex history of institution. Many of our MARC records are older than we are!). Too many possibilities? Where do we start?! How to communicate the benefits of this approach succinctly and convincingly. Technical challenges not trivial, but “the great thing about library data standards is that there are so many of them…”

An elastic bucket down the data well (#rdtf in Manchester)

Posted on April 20th, 2011 by Paul Stainthorp

I was in Manchester on Monday for Opening Data – Opening Doors, a one-day “advocacy workshop” hosted by JISC and RLUK under their Resource Discovery Taskforce (#rdtf) programme. I delivered a five-minute ‘personal pitch’ about Jerome, open data, and the rapid-development ethos that’s developing at Lincoln.

Ken Chad is writing up a report from the day and Helen Harrop is producing a blog, both of which will be signposted from the website: http://rdtf.mimas.ac.uk/

The big data question

All the presentations can be viewed on slideshare, but there were some particular moments that I think are worth picking out:

The JISC deputy, Prof. David Baker was first up. His presentation, ‘A Vision for Resource Discovery‘ should be compulsory reading for university librarians. See, in particular, slides #6 (guiding principles of the RDTF), #8 (a future state of the art by 2012), and #11 (key themes).

Slide from David Baker's presentation Slide from David Baker's presentation Slide from David Baker's presentation

Following this introduction, there were three ‘perspectives’, short presentations “reflecting on the real world motivations and efforts involved in opening up bibliographic, archival and museums data to the wider world”: from the National Maritime Museum, the National Archives

…and from Ed Chamberlain of (Jerome’s ‘sister project‘) COMET (Cambridge Open METadata), the perspective from Cambridge University Library on opening up access to their non-inconsiderable bibliographic data. N.B. slides #4 (what does COMET entail?), #9 (licensing) and—more than anything else—slide #16 (“beyond bibliography”).

Slide from Ed Chamberlain's presentation Slide from Ed Chamberlain's presentation Slide from Ed Chamberlain's presentation

The first breakout/discussion session which I sat in on looked at technical and licencing constraints to opening up access to [bib] data. This was the point at which the tortured business metaphors started to pile up. ‘Buckets’ of data. ‘Elastic’ buckets that can expand to include any kind of data. And (my personal contribution, continuing the wet theme): data often exist at the bottom of a ‘well’. Just because a well is open at the top, it doesn’t necessarily make it easy to get the water out! You need another kind of bucket – a service bucket that makes it possible to extract and make use of the water. Sorry, data. What were we talking about again?

Then a series of 5-minute ‘personal pitches’, including mine just after lunch. I didn’t use slides, but I’m typing up my handwritten notes on Google Docs and I’ll post them as a separate blog post when I get a chance.

David Kay (SERO), Paul Miller (Cloud of Data) and Owen Stephens delivered the meat of the afternoon session in their presentation, ‘The Open Bibliographic Data Guide – Preparing to eat the elephant‘. The website containing the Open Bib Data Guide (which has not been formally launched until now) can be found at: http://obd.jisc.ac.uk/

The site itself is going to be invaluable in hand-holding and guiding institutions through the possibilities in opening up access to their own bibliographic data (OBD). Slides from the presentation that are particularly worth noting are #8 (which shows the colour-coding used to distinguish the different OBD use-cases) and #14 (examples of existing OBD).

Slide from the OBD presentation Slide from the OBD presentation

Paul Walk’s presentation, ‘Technical standards & the RDTF Vision: some considerations‘, is the source of the slide which I photographed (at the top of this blog post). Paul talked about ‘safe bets’; aspects of the Web that we can rely on playing a part in allowing us to create a distributed environment for resource discovery: including “ROASOADOA” (Resource- / Service- / Data-Oriented Architecture), persistent identifiers, and a RESTful approach. See also this blog post.

In the second breakout/discussion session, we discussed technical approaches. One of the themes which we kept coming back to was that of two approaches (encapsulated by Paul’s slide) which—while not mutually exclusive—may require different business cases or different explanations in order to be taken up by institutions. We characterised the two approaches as:

  • Raw open data vs Data services
  • Triple store vs RESTful APIs
  • Jerome vs COMET (bit of a caricature, this one, but not entirely unjustified!)

I was gratified that Lincoln’s approach to rapid development and provision of open services was also referred to in non-ungratifying terms, as a model which could be valuable for the HE sector as a whole.

Finally, we heard what’s next for the #rdtf programme. It’s going to be rebranded as ‘Discovery‘ and formally re-launched under the new name at another event: ‘Discovery – building a UK metadata ecology‘ on Thursday, 26 May 2011, in London. See you there?

Ken Chad is writing up a report from the day and Helen Harrop is producing a blog, both of which will be signposted from the website: http://rdtf.mimas.ac.uk.