Posts Tagged ‘collections’

The Re-Architecting of Jerome

Posted on July 12th, 2011 by Nick Jackson

Over the past few days I’ve been doing some serious brain work about Jerome and how we best build our API layer to make it simultaneously awesomely cool and insanely fast whilst maintaining flexibility and clarity. Here’s the outcome.

To start with, we’re merging a wide variety of individual tables1 – one for each type of resource offered – into a single table which handles multiple resource types. We’ve opted to use all the fields in the RIS format as our ‘basic information’ fields, although obviously each individual resource type can extend this with their own data if necessary. This has a few benefits; first of all we can interface with our data easier than before without needing to write type-specific code which translates things back to our standardised search set. As a byproduct of this we can optimise our search algorithms even further, making it far more accurate and following generally accepted algorithms for this sort of thing. Of course, you’ll still be able to fine-tune how we search in the Mixing Deck.

To make this even easier to interface with from an admin side, we’ll be strapping some APIs (hooray!) on to this which support the addition, modification and removal of resources programmatically. What this means is that potentially anybody who has a resource collection they want to expose through Jerome can do, they just need to make sure their collection is registered to prevent people flooding it with nonsense that isn’t ‘approved’ as a resource. Things like the DIVERSE research project can now not only pull Jerome resource data into their interface, but also push into our discovery tool and harness Jerome’s recommendation tools. Which brings me neatly on to the next point.

Recommendation is something we want to get absolutely right in Jerome. The amount of information out there is simply staggering. Jerome already handles nearly 300,000 individual items and we want to expand that to way more by using data from more sources such as journal table of contents. Finding what you’re actually after in this can be like the proverbial needle in a haystack, and straight search can only find so much. To explore a subject further we need some form of recommendation and ‘similar item engine. What we’re using is an approach with a variety of angles.

At a basic level Jerome runs term extraction on any available textual content to gather a set of terms which describe the content, very similar to what you’ll know as tags. These are generated automatically from titles, synopses, abstracts and any available full text. We can then use the intersection of terms across multiple works to find and rank similar items based on how many of these terms are shared. This gives us a very simple “items like this” set of results for any item, with the advantage that it’ll work across all our collections. In other words, we can find useful journal articles based on a book, or suggest a paper in the repository which is on a similar subject to an article you’re looking for.

We then also have a second layer very similar to Amazon’s “people who bought this also bought…”, where we look over the history of users who used a specific resource to find common resources. These are then added to the mix and the rankings are tweaked accordingly, providing a human twist to the similar items by suppressing results which initially seem similar but which in actuality don’t have much in common at a content level, and pushing results which are related but which don’t have enough terms extracted for Jerome to infer this (for example books which only have a title and for which we can’t get a summary) up to where a user will find them easier.

Third of all in recommendation there’s the “people on your course also used” element, which is an attempt to make a third pass at fine-tuning the recommendation using data we have available on which course you’re studying or which department you’re in. This is very similar to the “used this also used” recommendation, but operating at a higher level. We analyse the borrowing patterns of an entire department or course to extract both titles and semantic terms which prove popular, and then boost these titles and terms in any recommendation results set. By only using this as a ‘booster’ in most cases it prevents recommendation sets from being populated with every book ever borrowed whilst at the same time providing a more relevant response.

So, that’s how we recommend items. APIs for this will abound, allowing external resource providers to register ‘uses’ of a resource with us for purposes of recommendation. We’re not done yet though, recommendation has another use!

As we have historical usage data for both individuals and courses, we can throw this into the mix for searching by using semantic terms to actively move results up or down (but never remove them) based on the tags which both the current user and similar users have actually found useful in the past. This means that (as an example) a computing student searching for the author name “J Bloggs” would have “Software Design by Joe Bloggs” boosted above “18th Century Needlework by Jessie Bloggs”, despite there being nothing else in the search term to make this distinction. As a final bit of epic coolness, Jerome will sport a “Recommended for You” section where we use all the recommendation systems at our disposal to find items which other similar users have found useful, as well as which share themes with items borrowed by the individual user.

  1. Strictly speaking Mongo calls them Collections, but I’ll stick with tables for clarity

Making Lists

Posted on March 30th, 2011 by Nick Jackson

One of the things that everybody academic needs the ability to do is list resources. Whether this is a lecturer putting together a reading list, a student gathering things for references, or a researcher arranging material to look at the basic functionality is still the same. Which is why one of the things we’re introducing with Jerome is our new Lists tool, the beta of which will be out soon.

Epic lists for all!

Fundamentally, it’s exactly what it sounds like. It’s a list of resources which a person has put together for a purpose, but we’re taking the concept and giving it a bit more of that Jerome sparkle. To start with, Lists work with all our resources be they catalogue items, journals, articles or repository content. You can add or remove list items from anywhere within the Jerome interface, and create multiple lists to look after content for different purposes.

On top of this, once you’ve got a few items added we’ll tap into our borrowing history and ‘similar content’ detection to suggest other resources which you may find useful. Yes, that’s a reference list which suggests other things to go have a look at, or a reading list which can point out other helpful books we may only just have acquired. The suggestions will also be hooked up to our smart suggestions weighting, adjusting the things we recommend based on the type of content you’ve preferred in the past.

So you can add things from our collections, and we’ll help bulk out your lists for you. We could stop there, but we’re not done yet. You can even tap straight into the power of Jerome’s content augmentation and add anything to your list, even if we don’t know about it. Tell us what you know about an item and we’ll go away and try to find out as much as we can for you. Got the title and author of a great book, but not enough to cite it? Tell us what you know and we’ll do our best to fill out the rest.

Once you’ve got a list we’re going all out to make it as useful as possible. You can keep it private or choose to share it with the world. Export your lists in a variety of standard formats compatible with citation managers and browser favourites, or even embed a widget into your webpage or blog (or course content on Blackboard).

Lists are coming soon to signed in users, with anonymous ‘short-term’ list creation coming a bit after. As always, we’d love to hear if you have any ideas for particular features you want to see.