44

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.

Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.

So, below is the query that I'm trying to run, followed by its counterparts:

{
    "query": {
        "filtered":{
            "filter": {
                "bool": {
                    "must_not": [
                        {
                            "missing":{
                                "field":"_textContent"
                            }
                        }
                    ]
                }
            }
        }
    }
}

I've also tried the following:

 {
    "query": {
        "filtered":{
            "filter": {
                "bool": {
                    "must_not": [
                        {
                            "missing":{
                                "field":"_textContent",
                                "existence":true,
                                "null_value":true
                            }
                        }
                    ]
                }
            }
        }
    }
}

And the following:

   {
    "query": {
        "filtered":{
            "filter": {
                    "missing": {"field": "_textContent"}
            }
        }
    }
}

None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.

If anyone can provide me with any help at all, I'll be very grateful.

Thanks!

2
  • 2
    For ES its always advised to mention the version since even the minor versions are having so much differences among them. Commented Jul 5, 2019 at 8:54
  • lucene/kql query of: yourfield.keyword:"" works. from one of the answers below stackoverflow.com/a/54046098/52074 Commented Oct 27, 2020 at 18:46

12 Answers 12

25

If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:

Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.

PUT http://localhost:9200/test/_mapping/demo
{
  "demo": {
    "properties": {
      "_content": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }
}

Next, index a couple of documents.

/POST http://localhost:9200/test/demo/1/
{
  "_content": ""
}

/POST http://localhost:9200/test/demo/2
{
  "_content": "some content"
}

Execute a search:

POST http://localhost:9200/test/demo/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "_content": ""
        }
      }
    }
  }
}

Returns the document with the empty string.

{
    took: 2,
    timed_out: false,
    _shards: {
        total: 5,
        successful: 5,
        failed: 0
    },
    hits: {
        total: 1,
        max_score: 0.30685282,
        hits: [
            {
                _index: test,
                _type: demo,
                _id: 1,
                _score: 0.30685282,
                _source: {
                    _content: ""
                }
            }
        ]
    }
}
Sign up to request clarification or add additional context in comments.

6 Comments

But I already have a lot of documents already stored in elasticsearch (around 50k). AFAIK, updating the mapping info requires the docs to be reindexed. Is that true, or this mapping update will work with my current docs?
If you update the mapping you will need to re-index. Have a look at the re-index plugin: github.com/karussell/elasticsearch-reindex
And also, this strategy requires me to store two copies of the field, one tokenized and the other as the original. This _textContent field is actually from a PDF file run through OCR, so it can get pretty big. Storing two copies maybe a little too much, I think.
I guess I'm going with a client side solution, for now. Thanks though :)
What to do if the mapping for the field is a keyword? Is that analyzed?
|
23

Found solution here https://github.com/elastic/elasticsearch/issues/7515 It works without reindex.

PUT t/t/1
{
  "textContent": ""
}

PUT t/t/2
{
  "textContent": "foo"
}

GET t/t/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "textContent"
          }
        }
      ],
      "must_not": [
        {
          "wildcard": {
            "textContent": "*"
          }
        }
      ]
    }
  }
}

1 Comment

Works on ES v. 5.6
11

Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:

curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
{
 "query": {
   "filtered": {
     "filter": {
       "script": {
         "script": "_source._content.length() == 0"
       }
     }
   }
 }
}'

It will return the document with empty string as _content without a special mapping

As pointed by @js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

5 Comments

Sorry... but this does NOT work. {"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":4,"col":14}],"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":4,"col":14},"status":400} I'm using elastic 5.2, I think that would effect whats going on
Hello @js_gandalf, well that's the problem with IT, sometimes after 2 years api breaks :-) It was answered with the version 0.90. I dont knwo what is the usage on SO, should I delete my answer? Anyway, tx for having notice this.
@VrigileD Its okay. I've been stumped on this for awhile on this. And its killing me! So I have to reindex my whole database now for it to work? I'm using 5.2 the latest version of elastic search.
I'm not using ES this days but I think filtered has been deprecated and now you should use query->bool->filter->script, smthing like elastic.co/guide/en/elasticsearch/reference/current/…
I just ended up reindexing the whole database. Then it worked! Posting an answer. for ES > 5.2
8

For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:

"query": {
    "term": {"MY_FIELD_TO_SEARCH": ""}
}

Actually, when I reindex my database and rerun the query. It worked =)

The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:

curl -X PUT https://username:[email protected]:9200/mycoolindex

curl -X PUT https://user:[email protected]:9200/mycoolindex/_mapping/mycooltype -d '{
  "properties": {
            "MY_FIELD_TO_SEARCH": {
                    "type": "keyword"
                },
}'

curl -X PUT https://username:[email protected]:9200/_reindex -d '{
 "source": {
   "index": "oldindex"
 },
 "dest": {
    "index": "mycoolindex"
 }
}'

I hope this helps someone who was as stuck as I was finding those empty values.

1 Comment

works also on nest. need to add Verbatim to the query.
6

OR using lucene query string syntax

q=yourfield.keyword:""

See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax

Comments

3

in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .

If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:

{"from": 0, "size": 100, "query":{"term": {"name":""}}}

Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So you can use the filter to find the empty string.

{"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}

here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571

BTW, I check the commands you provided, it seems you DON'T want the empty string document. And all my above command are just to find these, so just put it into must_not part of bool query would be fine. My ES is 1.0.1.


For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.

Anyway, it also provides another command to find

{ "query": { "filtered": { "filter": { "not": { "filter": { "range": { "name": { } } } } } } } }

name is the field name to find the empty-string. I've tested it on ES 1.3.2.

5 Comments

No, actually I want to find all documents that has an empty string in this particular field. I might be querying that wrong. BTW, the index is analyzed as I do full text search on this field.
OK, If so, the second command would be fine.(check the gist) BTW, if the field is used for full text search, I think the not_analyzed setting might not be useful for you.
I've tried the query on your gist but it doesn't work, it apparently doesn't treat empty string as null, which is weird. I think I'll implement this on the client side. I don't know if it makes sense, but it seems to me that ES is missing an "empty" filter... Anyway, thanks for the help!
Hi, I just ran a quick test on ES 1.3.2. The gist I provided does not work as you said. Although it worked at ES 1.0.1, which is currently I used in my project. I'm no sure whether this is bug or not. I will do more tests about it.
Hi @PauloVictor I think you might know it. This is the bug reported from github.com/elasticsearch/elasticsearch/issues/7348, that reports for ES 1.3.0. But the official offer some other commands to get what you want, check my edit above.
3

I'm using Elasticsearch 5.3 and was having trouble with some of the above answers.

The following body worked for me.

 {
    "query": {
        "bool" : {
            "must" : {
                "script" : {
                    "script" : {
                        "inline": "doc['city'].empty",
                        "lang": "painless"
                     }
                }
            }
        }
    }
}

Note: you might need to enable the fielddata for text fields, it is disabled by default. Although I would read this: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html before doing so.

To enable the fielddata for a field e.g. 'city' on index 'business' with type name 'record' you need:

PUT business/_mapping/record
{
    "properties": {
        "city": {
          "type": "text",
          "fielddata": true
        }
      }
}

Comments

3

If you don't want to or can't re-index there is another way. :-)

You can use the negation operator and a wildcard to match any non-blank string *

GET /my_index/_search?q=!(fieldToLookFor:*)

1 Comment

But it returns only documents with this field as null. It's equal to {"query":{"bool":{"must_not":{"exists":{"field":"address"}}}}}
0

For nested fields use:

curl -XGET "http://localhost:9200/city/_search?pretty=true" -d '{
     "query" : {
         "nested" : {
             "path" : "country",
             "score_mode" : "avg",
             "query" : {
                 "bool": {
                    "must_not": {
                        "exists": {
                            "field": "country.name" 
                        }
                    }
                 }
             }
         }
     }
}'

NOTE: path and field together constitute for search. Change as required for you to work.

For regular fields:

curl -XGET 'http://localhost:9200/city/_search?pretty=true' -d'{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "name"
                } 
            } 
        } 
    } 
}'

1 Comment

They're asking for a field that does exist but holds an empty string. must_not exists only works on empty arrays and null values, because they're not indexed technically speaking. Empty strings are.
0

I didn't manage to search for empty strings in a text field. However it seems to work with a field of type keyword. So I suggest the following:

    delete /test_idx

    put test_idx
    {
      "mappings" : {
        "testMapping": {
          "properties" : {
            "tag" : {"type":"text"},
            "content" : {"type":"text",
                         "fields" : {
                           "x" : {"type" : "keyword"}
                         }
            }
          }
        }
      }
    }

put /test_idx/testMapping/1
{
  "tag": "null"
}

put /test_idx/testMapping/2
{
  "tag": "empty",
  "content": ""
}

GET /test_idx/testMapping/_search
{
   "query" : {
     "match" : {"content.x" : ""}}}
             }
}

Comments

0

You need to trigger the keyword indexer by adding .content to your field name. Depending on how the original index was set up, the following "just works" for me using AWS ElasticSearch v6.x.

GET /my_idx/_search?q=my_field.content:""

Comments

0

I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me

Note this is in elastic 7.x

POST <index_name|pattern>/_update_by_query
{
  "script": {
    "lang": "painless",
    "source": """
      if (ctx._source.<field name>== "") {
        ctx._source.<field_name>= "0";
      } else {
        ctx.op = "noop";
      }
    """
  }
}

I followed one of the responses from the thread and came up with below it will do the same

GET index_pattern*/_update_by_query
{
  "script": {
    "source": "ctx._source.field_name='0'",
    "lang": "painless"
  },
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "field_name"
          }
        }
      ],
      "must_not": [
        {
          "wildcard": {
            "field_name": "*"
          }
        }
      ]
    }
  }  
}

I am also trying to find the documents in the index that dont have the field and add them with a value

one of the responses from this thread helped me to come up with below

GET index_pattern*/_update_by_query
{
  "script": {
    "source": "ctx._source.field_name='0'",
    "lang": "painless"
  },
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "field_name"
          }
        }
      ]
    }
  }
}

Thanks to every one who contributed to this thread I am able to solve my problem

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.