Geo-Spatial Features in Solr 4.2

Last week I have shown how you can use the classic spatial support in Solr. It uses the LatLonType to index locations that can then be used to query, filter or sort by distance. Starting with Solr 4.2 there is a new module available. It uses the Lucene Spatial module which is more powerful but also needs to be used differently. You can still use the old approach but in this post I will show you how to use the new features to do the same operations we saw last week.

Indexing Locations

Again we are indexing talks that contain a title and a location. For the new spatial support you need to add a different field type:

<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
distErrPct="0.025"
maxDistErr="0.000009"
units="degrees"/>

Contrary to the LatLonType the SpatialRecursivePrefixTreeFieldType is no subfield type but stores the data structure itself. The attribute maxDistErr determines the accuracy of the location, in this case it is 0.000009 degrees which is close to one meter and should be enough for most location searches.

To use the type in our documents of course we also need to add it as a field:

<field name="location" type="location_rpt" indexed="true" stored="true"/>

Now we are indexing some documents with three fields: the path (which is our id), the title of the talk and the location.

curl http://localhost:8082/solr/update/json?commit=true -H 'Content-type:application/json' -d '
[
{"path" : "1", "title" : "Search Evolution", "location" : "49.487036,8.458001"},
{"path" : "2", "title" : "Suchen und Finden mit Lucene und Solr", "location" : "49.013787,8.419936"}
]'

Again, the location of the first document is Mannheim, the second Karlsruhe. You can see that the locations are encoded in an ngram-like fashion when looking at the schema browser in the administration backend:

Sorting by Distance

A common use case is to sort the results by distance from a certain location. You can't use the Solr 3 syntax anymore but need to use a the geofilt query parser that maps the distance to the score which you then sort on.

http://localhost:8082/solr/select?q={!geofilt%20score=distance%20sfield=location%20pt=49.487036,8.458001%20d=100}&sort=score asc

As the name implies the geofilt query parser originally is for filtering. You need to pass in the distance that is used for filtering so by sorting you might also cause an impact on the results that are returned. For our example passing in a distance of 10 kilometers will only yield one result. This is something to be aware of.

Filtering by Distance

We can use the same approach we saw above to filter our results to only match talks in a given area. We can either use the geofilt query parser (that filters by radius) or the bbox query parser (which filters on a box around the radius). As you can imagine, the query looks similar:

http://localhost:8082/solr/select?q=*:*&fq={!geofilt%20score=distance%20sfield=location%20pt=49.013787,8.419936%20d=10}

This will return all talks in a distance of 10 kilometers from Karlsruhe.

Doing Fancy Stuff

Besides the features we have looked at in this post you can also do more advanced stuff. In Solr 3 Spatial you can't have multivalued location fields, which is possible with Solr 4.2. Also now you can also index lines or polygons that can then be queried and intersected. In this presentation Chris Hostetter uses this feature to determine overlapping of time, an interesting use case that you might not think of at first.