Archive for the ‘Experience’ Category

If I may make a suggestion?

Posted on April 26th, 2011 by Nick Jackson

One of the cool things that Jerome wants to do as part of its looking at the library access portal (give our current build a whirl) is provide relevant, accurate suggestions of books which you may find useful. This is a whopping huge piece of statistics which is causing me to dust off my A Level statistics, whip out some algorithms and start thinking about ways of actually doing it in a reliable way. This blog post is a summary of some of my thoughts on the issue, and I’d really appreciate it if people could weigh in with any suggestions or experience before I delve in to the depths of coding recommendations.

Here goes. First of all, we want to use as many different sources as possible from which to derive similarity suggestions. At the moment we’ve got a few suggestions:

  • People who borrowed this book also borrowed… (See how Huddersfield do it). This provides direct, human-based connections between items which are weighted by both how common the combination of books are and also how unique the combination is1.
  • Catalogued Subject Headings. Using the mad cataloguing skills of our ninja cataloguers to determine which books share a subject, as well as boosting these subject headings with data from OpenLibrary. The weight of a subject heading in the overall similarity ranking is inversely proportional to the number of times it’s used. This gives stronger recommendations to books which are within a smaller field of interest. We can also use subject headings to suggest similar journals.
  • Extracted Semantic Tags. As we pull OpenLibrary summaries for books and abstracts from our repository we’ll be slamming them through OpenCalais to extract semantic information on what the item is actually about. These then go into the item’s tags and (using a similar algorithm to subject headings) are weighed to find works about the same type of things.
  • Manual Groupings. Jerome has in-built support for ‘lists’ of items, intended to provide for things such as reading lists, collections of citations for a specific paper and so-on. We can assume that items in a list together are related, giving us a potentially huge set of manually curated similarities. To prevent too much positive reinforcement, again we’ll be weighting a link in inverse proportion to its popularity.

Read the rest of this entry »

  1. Not mutually exclusive. One his how many times book A is borrowed by people who also borrowed book B, one is how often book B is only borrowed by people who borrowed book A []

Making Lists

Posted on March 30th, 2011 by Nick Jackson

One of the things that everybody academic needs the ability to do is list resources. Whether this is a lecturer putting together a reading list, a student gathering things for references, or a researcher arranging material to look at the basic functionality is still the same. Which is why one of the things we’re introducing with Jerome is our new Lists tool, the beta of which will be out soon.

Epic lists for all!

Fundamentally, it’s exactly what it sounds like. It’s a list of resources which a person has put together for a purpose, but we’re taking the concept and giving it a bit more of that Jerome sparkle. To start with, Lists work with all our resources be they catalogue items, journals, articles or repository content. You can add or remove list items from anywhere within the Jerome interface, and create multiple lists to look after content for different purposes.

On top of this, once you’ve got a few items added we’ll tap into our borrowing history and ‘similar content’ detection to suggest other resources which you may find useful. Yes, that’s a reference list which suggests other things to go have a look at, or a reading list which can point out other helpful books we may only just have acquired. The suggestions will also be hooked up to our smart suggestions weighting, adjusting the things we recommend based on the type of content you’ve preferred in the past.

So you can add things from our collections, and we’ll help bulk out your lists for you. We could stop there, but we’re not done yet. You can even tap straight into the power of Jerome’s content augmentation and add anything to your list, even if we don’t know about it. Tell us what you know about an item and we’ll go away and try to find out as much as we can for you. Got the title and author of a great book, but not enough to cite it? Tell us what you know and we’ll do our best to fill out the rest.

Once you’ve got a list we’re going all out to make it as useful as possible. You can keep it private or choose to share it with the world. Export your lists in a variety of standard formats compatible with citation managers and browser favourites, or even embed a widget into your webpage or blog (or course content on Blackboard).

Lists are coming soon to signed in users, with anonymous ‘short-term’ list creation coming a bit after. As always, we’d love to hear if you have any ideas for particular features you want to see.

See our vest! See our vest!

Posted on March 28th, 2011 by Paul Stainthorp

With fanfare, whooping and much ringing of bells: the first pass at Jerome’s browser search interface is now public and living at a ‘real’ URL.

Take a look for yourself, at:

Jerome screenshot

Why not…

  • Try a few searches?
  • Marvel at the combination of catalogue records, repository data, and journals?
  • Mix it up a bit?
  • Leave us some feedback?

But please remember: ‘experimental’ ain’t the word. Don’t expect everything to work just yet… have fun!

Mixing It Up

Posted on March 27th, 2011 by Nick Jackson

This blog post explains some of the behind-the-scenes magic of one of our cool new features in Jerome, which is trying to make the experience of searching all our collections both easier and more intuitive.

Everybody knows how search works. You put in the words you’re looking for, it goes away and finds some results, and then you’re shown them. If you’re feeling clever you might throw in some advanced operators such as Boolean logic, or perhaps exact phrase searches. A few places may even offer you power searches, letting you choose which fields you’re searching in.

For Jerome we’re taking this concept and mixing it up a little. Literally in our case, with our inventively named Mixing Desk tool. This is basically a set of sliders relating to our collections and fields, which you can tweak to tell us how much importance you assign to the respective fields. If, for example, you want us to pay more attention to our institutional repository, throw in the odd journal if it’s a really good match, discount titles (because you’re searching for a common word as an author’s last name) and totally remove books from the results then you can do. Generally (unless you completely disable a collection or field) doing this won’t alter the results you receive, but it will change the order in which they’re presented to you.

Even better is the fact that through some JavaScript wizardry (and our search API) you can see the changes to your results instantly, with no need to refresh or reload the page. Pull a slider up and watch your results rearrange themselves, or turn off a collection and gasp in amazement as its items vanish before your very eyes. It’s quite cool.

“But wait!”, I hear you cry. “How can this unholy voodoo be performed on search results? Are they not precomputed for every keyword combination by ninja librarians?”. Unfortunately, no they aren’t. Instead our ninja search engine – Sphinx – does some magic using precomputed hit indexes and a bit of mathematical trickery called weighting.

Weighting is where we assign a relevance – or ‘weight’ – to every index which we search (that’s the collections bit), and to each field within those indexes. There’s a hugely complex algorithm which deals with the basic score of each search query within each field, based on facts such as the number of term matches, the length of the field and so on. Once we have this basic score it is then multiplied by the field weighting to give us our total score for that field. Add the field scores together to get the total score for that item, and then multiply by the index weighting to get a final item score. Sort results by this score to get the most relevant first.

One thing we’ve got planned for the future is tentatively named the Equaliser (in keeping with the Mixing Desk theme), and much like a graphic EQ on a mixing desk will indicate which frequencies are loudest the Jerome Equaliser will show you how the resultant weightings break down, giving you a simplified peek into exactly how we’ve arrived at the results set that we did. From there you can see which words really helped to drive your search and adjust accordingly if you want to, perhaps by getting rid of terms which didn’t help you or by tweaking the weight for those that did.

The Mixing Desk and Equaliser address the relevancy issue of library catalogue searching in an easily understood way, by presenting the underlying mechanism of searching such that it can be adjusted to provide more relevant results for the individual doing the searching at the point of use, rather than limiting them to predefined scopes. Search logging and result tracking also allows us to track the preferred weighting for individual users and user groups, and see which weights provide the highest value results. This gives us an opportunity to adjust default weightings on a per-group basis, matching how searches behave to better reflect the user’s intentions even before they choose to customise them.

It’s starting to look useful…

Posted on November 26th, 2010 by Nick Jackson

It is with great pleasure and a little bit of excitement that I would like to bring you up to date on the latest developments from the land of Jerome.

First of all, our looping catalogue import system is now up and running properly. This system literally starts at the first record in our catalogue and slowly but steadily checks and imports every single one, at the rate of 45 a minute. The entire import of our current catalogue completes in a little under 5 days, and as soon as it’s done the system starts the process again. This means that although Jerome isn’t showing you the ‘live’ catalogue, it’s never more than a few days out of date and most of what changes in the catalogue is just housekeeping and fixing ‘wrong’ records. New stock is automatically detected and added, so the act of getting data into Jerome is now fully automated. We’ve already completed one whole round, and we’re currently around 10,000 into the second one.

As part of the import process we now automatically grab free book covers from LibraryThing and cache a local copy. Although this is by no means complete and doesn’t offer a cover for everything it is a start, and shows just how useful the rest of the world can be in filling out our information. Covers (where available) exist for bib numbers over 272,000(ish) and under 10,000(ish), but as part of the looping import these will appear over the next week for items in the middle. We’re also looking at other cover providers such as OpenLibrary or Amazon to help boost the quality and quantity of covers, but due to restrictive licensing we’re having to tread carefully. Book covers will be used more liberally in some future features, including things such as a ‘looks like’ cover finder – using the power of perceptual hashes – to help find that book you can’t remember the name of but you can see the cover in your mind.

Read the rest of this entry »