Stringify Everything in Elasticsearch

A while ago I was working on a prototype to search larger structured documents using Elasticsearch. We were only interested to make the text searchable, with an option to search all the text and some seperate fields. Elasticsearch is of course a perfect solution for this with the _all field and the possiblility to search single or multiple fields.

The documents we had to make searchable were rather complex, consisting of hundreds of fields with different data types, special identifiers and numeric and string values. We had everything exported in JSON documents but unfortunately the data types were mixed and changing from document to document. Indexing two example documents might look like this:

POST /example/doc
{
    "my-id": 1,
    "my-tag": "one",
    "my-flag": true
}

POST /example/doc
{
    "my-id": "1b",
    "my-tag": "one b",
    "my-flag": "enabled"
}

What happens when these documents are indexed? Elasticsearch will try to guess the field type by its value. For the first document my-id clearly is a numeric value and my-flag a boolean. But for the second document the field types change to string values that of course can't be indexed in a numeric or boolean field. So indexing will fail for the second document. What can be done?

Of course the best way would have been to create the JSON documents correctly in the first place but this would have been rather complex because of the environment we were working in. As we were working on a prototype the quicker solution was to create a mapping for Elasticsearch that treats the values as strings. But creating a dedicated mapping would have been too complex – remember there were hundres of fields that we would have to check and configure. As we were ok with just treating all the fields in the documents as string values the solution was to add a dynamic template to our index that then maps all fields to string.

DELETE /example

PUT /example

PUT /example/doc/_mapping
{
   "doc": {
      "dynamic_templates": [
         {
            "all_strings": {
               "match": "*",
               "mapping": {
                  "type": "string",
                  "analyzer": "standard"
               }
            }
         }
      ]
   }
}

Now, when indexing the documents again, Elasticsearch will consider the dynamic template all_strings. By adding * for match this will be enabled for all fields. Each field will then automatically be configured as a string value, all documents can be indexed and searched afterwards.

About Florian Hopf

I am working as a freelance software developer and consultant in Karlsruhe, Germany and have written a German book on Elasticsearch. If you liked this post you can follow me on Twitter or subscribe to my feed to get notified of new posts. If you think I can help you and your company and you'd like to work with me please contact me directly

.