This blog post explains some of the behind-the-scenes magic of one of our cool new features in Jerome, which is trying to make the experience of searching all our collections both easier and more intuitive.
Everybody knows how search works. You put in the words you’re looking for, it goes away and finds some results, and then you’re shown them. If you’re feeling clever you might throw in some advanced operators such as Boolean logic, or perhaps exact phrase searches. A few places may even offer you power searches, letting you choose which fields you’re searching in.
For Jerome we’re taking this concept and mixing it up a little. Literally in our case, with our inventively named Mixing Desk tool. This is basically a set of sliders relating to our collections and fields, which you can tweak to tell us how much importance you assign to the respective fields. If, for example, you want us to pay more attention to our institutional repository, throw in the odd journal if it’s a really good match, discount titles (because you’re searching for a common word as an author’s last name) and totally remove books from the results then you can do. Generally (unless you completely disable a collection or field) doing this won’t alter the results you receive, but it will change the order in which they’re presented to you.
“But wait!”, I hear you cry. “How can this unholy voodoo be performed on search results? Are they not precomputed for every keyword combination by ninja librarians?”. Unfortunately, no they aren’t. Instead our ninja search engine – Sphinx – does some magic using precomputed hit indexes and a bit of mathematical trickery called weighting.
Weighting is where we assign a relevance – or ‘weight’ – to every index which we search (that’s the collections bit), and to each field within those indexes. There’s a hugely complex algorithm which deals with the basic score of each search query within each field, based on facts such as the number of term matches, the length of the field and so on. Once we have this basic score it is then multiplied by the field weighting to give us our total score for that field. Add the field scores together to get the total score for that item, and then multiply by the index weighting to get a final item score. Sort results by this score to get the most relevant first.
One thing we’ve got planned for the future is tentatively named the Equaliser (in keeping with the Mixing Desk theme), and much like a graphic EQ on a mixing desk will indicate which frequencies are loudest the Jerome Equaliser will show you how the resultant weightings break down, giving you a simplified peek into exactly how we’ve arrived at the results set that we did. From there you can see which words really helped to drive your search and adjust accordingly if you want to, perhaps by getting rid of terms which didn’t help you or by tweaking the weight for those that did.
The Mixing Desk and Equaliser address the relevancy issue of library catalogue searching in an easily understood way, by presenting the underlying mechanism of searching such that it can be adjusted to provide more relevant results for the individual doing the searching at the point of use, rather than limiting them to predefined scopes. Search logging and result tracking also allows us to track the preferred weighting for individual users and user groups, and see which weights provide the highest value results. This gives us an opportunity to adjust default weightings on a per-group basis, matching how searches behave to better reflect the user’s intentions even before they choose to customise them.