Reindexing Content in Elasticsearch with stream2es

Last week I wrote about reindexing content in Elasticsearch using a script that extracts the source field from the indexed content. You can use it for cases when your mapping changes or you need to adjust the index settings. After publishing the post Drew Raines mentioned that there is an easier way using the stream2es utility only. Time to have a look at it!

stream2es can be used to stream content from several inputs to Elasticsearch. In my last post I used it to stream a file containing the sources of documents to an Elasticsearch index. Besides that it can index data from Wikipedia or Twitter or from Elasticsearch directly, which we will look at now.

Again, we are indexing some documents:

curl -XPOST "http://localhost:9200/twitter/tweet/" -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
curl -XPOST "http://localhost:9200/twitter/tweet/" -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:14:14",
"message" : "Elasticsearch works!"
}'

Now, if we need to adjust the mapping we can just create a new index with the new mapping:

curl -XPOST "http://localhost:9200/twitter2" -d'
{
"mappings" : {
"tweet" : {
"properties" : {
"user" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}'

You can now use stream2es to transfer the documents from the old index to the new one:

stream2es es --source http://localhost:9200/twitter/ --target http://localhost:9200/twitter2/

This will make our documents available in the new index:

curl -XGET http://localhost:9200/twitter2/_count?pretty=true
{
"count" : 2,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

You can now delete the old index. To keep your data available on the same old index name you can also create an alias that will point to your new index:

curl -XDELETE http://localhost:9200/twitter
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "add" : { "index" : "twitter2", "alias" : "twitter" } }
]
}'

Looking at the mapping you can see that the twitter index now points to our updated version:

curl -XGET http://localhost:9200/twitter/tweet/_mapping?pretty=true
{
"tweet" : {
"properties" : {
"bytes" : {
"type" : "long"
},
"message" : {
"type" : "string"
},
"offset" : {
"type" : "long"
},
"post_date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"user" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs"
}
}
}
}