Learning Lucene

I am currently working with a team starting a new project based on Lucene. While most of the time I would argue on using either Solr or Elasticsearch instead of plain Lucene it was a conscious decision. In this post I am compiling some sources for learning Lucene – I hope you will find them helpful or you can hint what sources I missed.

Project documentation

The first choice of course is the excellent project documentation. It contains the Javadoc for all the modules (core, analyzers-common and queryparser being the most important ones) that also contains further documentation, for example an explanation of a simple demo app and helpful introductions to analysis and querying and scoring. You might also be interested in the standard index file formats.

Besides the documentation that comes with the releases there is also lots of information in the project wiki but you need to know what you are looking for. You can also join the mailing lists to learn about what other users are doing.

When looking at analyzer components the Solr Start website can be useful. Though dedicated to Solr the list of analyzer components can be useful to determine analyzers for Lucene as well. It also contains a searchable version of the Javadocs.

Books

The classic book about the topic is Lucene in Action. On over 500 pages it explains all the underlying concepts in detail. Unfortunately some of the information is outdated and lots of the code examples won't work anymore. Also the newer concepts are not included. Still it's the recommended piece on learning Lucene.

Anonther book I've read is Lucene 4 Cookbook published at Packt. It contains more current examples but is not suited well for learning the basics. Additionally it felt to me as if no editor worked on this book, there are lots of repetitions, typos and broken sentences. (I am making lots of grammar mistakes myself when blogging - but I am expecting more from a published book.)

You can also learn a lot about different aspects of Lucene by reading a book on one of the search servers based on it. I can recommend Elasticsearch in Action, Solr in Action and Elasticsearch – The definitive Guide. (If you can read German I am of course inviting you to read my book on Elasticsearch.)

Blogs, Conferences and Videos

There are countless blog posts on Lucene, a very good introduction is Lucene: The Good Parts by Andrew Montalenti. Some blogs publish regular pieces on Lucene, recommended ones are by Mike McCandless (who now mostly blogs on the elastic Blog), OpenSource Connections, Flax and Uwe Schindler. There is a lot of content about Lucene on the elastic Blog, if you want to hear about current development I can recommend the "This week in Elasticsearch and Apache Lucene" series. There are also some interesting posts on the Lucidworks Blog and I am sure there are lots of other blogs I forgot to mention here.

Lucene is a regular topic on two larger conferences: Lucene/Solr Revolution and Berlin Buzzwords. You can find lots of video recordings of the past events on their website.

Sources

Finally, the project is open source so you can learn a lot about it by reading the source code of either the library or the tests.

Another option is to look at applications using it, either Solr and Elasticsearch. Of course you need to find your way around the sources of the project but sometimes this isn't too hard. One example for Elasticsearch: If you would like to learn about how the common multi_match-Query is implemented in Lucene you will easily find the class MultiMatchQuery that creates the Lucene queries.

What did I miss?

I hope there is something useful for you in this post. I am sure I missed lots of great resources for learning Lucene. If you would like to add one let me know in the comments or on Twitter.