Book Review: Relevant Search
20 Oct 2016Relevancy, the notion that some results are better than others is one of the key factors that distinguishes search engines from most other databases. Additionaly it is a task that can sometimes seem like magic and is difficult to get right. Applications like Google have set the bar for how a search engine is expected to work. The relevant results should all be on the top positions. As the saying goes, if you want to make sure a secret stays a secret put it on page 3 of a Google search result page.
Doug Turnbull and John Berryman have written a book about all aspects that are related to relevancy. You will learn how the inverted index works, about different kinds of queries and the way they influence the score of the result documents. You will see how you can use boost or special queries to influence the result ordering and about different ways to help the user find the things they are looking for.
For most parts the book uses one coherent example, the search for movies. This is very well suited as it is a mixture of structured and unstructured data. All the examples in the book are using Elasticsearch but there is also an appendix that shows how to do similar things with Solr.
When starting with the book I thought I had a basic introduction to search engines in my hand. But I was wrong - both authors obviously have lots of experience with search relevancy tuning (no wonder they are connected to the development of tools like Splainer and Quepid). The book is different from a lot of other books on search technologies in that it doesn't describe all the features of a certain search engine but shows how to use them to build business applications.
Event though I am intensively working with search engines myself I learned a lot while reading the book. Some of the tactics are things that are widely done when building applications based on search engines but the authors manage to name them explicitly and build a structured approach. Finally, besides being very informative the book is an easy read that contains lots of jokes. If you are doing something with search engines you are well advised to read it.