Geo-Spatial Features in Solr 4.2
17 Jan 2014Last week I have shown how you can use the classic spatial support in Solr. It uses the LatLonType
to index locations that can then be used to query, filter or sort by distance. Starting with Solr 4.2 there is a new module available. It uses the Lucene Spatial module which is more powerful but also needs to be used differently. You can still use the old approach but in this post I will show you how to use the new features to do the same operations we saw last week.
Indexing Locations
Again we are indexing talks that contain a title and a location. For the new spatial support you need to add a different field type:
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
distErrPct="0.025"
maxDistErr="0.000009"
units="degrees"/>
Contrary to the LatLonType
the SpatialRecursivePrefixTreeFieldType
is no subfield type but stores the data structure itself. The attribute maxDistErr
determines the accuracy of the location, in this case it is 0.000009 degrees which is close to one meter and should be enough for most location searches.
To use the type in our documents of course we also need to add it as a field:
<field name="location" type="location_rpt" indexed="true" stored="true"/>
Now we are indexing some documents with three fields: the path (which is our id), the title of the talk and the location.
curl http://localhost:8082/solr/update/json?commit=true -H 'Content-type:application/json' -d '
[
{"path" : "1", "title" : "Search Evolution", "location" : "49.487036,8.458001"},
{"path" : "2", "title" : "Suchen und Finden mit Lucene und Solr", "location" : "49.013787,8.419936"}
]'
Again, the location of the first document is Mannheim, the second Karlsruhe. You can see that the locations are encoded in an ngram-like fashion when looking at the schema browser in the administration backend:
Sorting by Distance
A common use case is to sort the results by distance from a certain location. You can't use the Solr 3 syntax anymore but need to use a the geofilt
query parser that maps the distance to the score which you then sort on.
http://localhost:8082/solr/select?q={!geofilt%20score=distance%20sfield=location%20pt=49.487036,8.458001%20d=100}&sort=score asc
As the name implies the geofilt
query parser originally is for filtering. You need to pass in the distance that is used for filtering so by sorting you might also cause an impact on the results that are returned. For our example passing in a distance of 10 kilometers will only yield one result. This is something to be aware of.
Filtering by Distance
We can use the same approach we saw above to filter our results to only match talks in a given area. We can either use the geofilt query parser (that filters by radius) or the bbox query parser (which filters on a box around the radius). As you can imagine, the query looks similar:
http://localhost:8082/solr/select?q=*:*&fq={!geofilt%20score=distance%20sfield=location%20pt=49.013787,8.419936%20d=10}
This will return all talks in a distance of 10 kilometers from Karlsruhe.
Doing Fancy Stuff
Besides the features we have looked at in this post you can also do more advanced stuff. In Solr 3 Spatial you can't have multivalued location fields, which is possible with Solr 4.2. Also now you can also index lines or polygons that can then be queried and intersected. In this presentation Chris Hostetter uses this feature to determine overlapping of time, an interesting use case that you might not think of at first.