Recent Posts

You can find a list of all posts in the archive

Running and Testing Solr with Gradle

A while ago I blogged on testing Solr with Maven on the synyx blog. In this post I will show you how to setup a similar project with Gradle that can start the Solr webapp and execute tests against your configuration.

Read more

Reading term values for fields from a Lucene Index

Sometimes when using Lucene you might want to retrieve all term values for a given field. Think of categories that you want to display as search links or in a filtering dropdown box. Indexing might look something like this:

Read more

Berlin Buzzwords 2012

Berlin Buzzwords is an annual conference on search, store and scale technology. I've heard good things about it before and finally got convinced to go there this year. The conference itself lasts for two days but there are additional events before and afterwards so if you like you can spend a whole week.

Read more

Content Extraction with Apache Tika

Sometimes you need access to the content of documents, be it that you want to analyze it, store the content in a database or index it for searching. Different formats like word documents, pdfs and html documents need different treatment. Apache Tika is a project that combines several open source projects for reading content from a multitude of file formats and makes the textual content as well as some metadata available using a uniform API. I will show two ways how to leverage the power of Tika for your projects.

Read more

Importing Atom feeds in Solr using the Data Import Handler

I am working on a search solution that makes some of the content I am producing available through one search interface. One of the content stores is the blog you are reading right now, which among other options makes the content available here using Atom.

Read more