Posts Tagged ‘Sphinx’

The Slides

Posted on July 30th, 2010 by Nick Jackson

Here are the slides from my presentation at Chips and Mash in Huddersfield.

If you’re curious for a bit more reading on MongoDB and Sphinx, the two systems at the core of Jerome, read on.

Read the rest of this entry »

Engage Ludicrous Speed!

Posted on July 23rd, 2010 by Nick Jackson

One of our key aims for Jerome is for the whole thing to be fast. Not “the average search should complete in under a second” fast, but “your application should be fine to hit us with 50 queries a second” fast.

This requirement was one of the key factors in our decision to use MongoDB as our backend database, and provide search using Sphinx. We’ll have another blog post fairly soon with more detail on how we’re using Mongo and Sphinx to store, search and retrieve data but for now I’d like to share some preliminary numbers on how close we are to our goal of speed.

First of all, getting data in. This is a pain in the backside due to the MARC-21 specification being so complex and needing to perform several repetitive checks on data to make sure we’re importing it right. However, on the import side of things we’re in the region of importing 150 MARC records a second, including parsing, filtering, mapping fields and finally getting the data into the database. This is done using the File_MARC PEAR library to manage the actual parsing of the MARC data into a set of arrays, then some custom PHP to extract information like title, author, publisher etc. into a more readily understood format. This information extraction isn’t or complete yet so it’s likely that there’ll be a bit of a slowdown as we add more translation rules, but equally it’s not optimised to improve speed.

Read the rest of this entry »