Here are the slides from my presentation at Chips and Mash in Huddersfield.
If you’re curious for a bit more reading on MongoDB and Sphinx, the two systems at the core of Jerome, read on.
MongoDB
MongoDB is an open source, high-speed NoSQL document database (meaning that unlike a traditional RDBMS it stores each record in a varying format). It scales very well horizontally, and the newer versions include support for both replica sets (improving resilience should one server fail) and sharding (improving performance in large databases).
Mongo is already in production usage in loads of places for various tasks, including FourSquare, bit.ly, The New York Times website, GitHub, CollegeHumor and SugarCRM amongst many others.
We chose to use Mongo for Jerome for many reasons, but one of the key reasons was the fact it supports document storage using JSON, making both storage and retrieval very simple. You can take a look at more use cases given on the Mongo website if you’re interested in how it might fit into your organisation, or if it’s the right solution to a problem.
Sphinx
Sphinx is an open-source, high-speed full-text search engine. It supports queries using various methods including SQL, and can index information stored either in any ODBC database or as separate XML files. It supports distributed indexing for speed and resilience, and can hit 500 queries a second on indexes of a million records.
Sphinx also supports a wide variety of indexing methods, including stemming, metaphones (sound-alike searches) and synonym processing using custom dictionaries. It’s also UTF-8 throughout, providing support for any language out of the box (essential when we’re dealing with titles in multiple languages).
The search syntax in Sphinx was one of the major reasons for our choosing to use it. As well as supporting traditional Boolean logic it can also support many other features such as quota based searches, start/end delimiters, strict order, substrings and more. You can also use fields or index metadata to group or sort search results, allowing for very accurate searches to be performed very quickly.

[...] and trying to develop any of the ideas whizzing round my mind. It was good to have an update on the Jerome project run at University of Lincoln. Lincoln are working on improving access to resources, improving search speed, utilsing the [...]