8

I created an index in Elasticsearch with the following settings. After inserting data into the index using Bulk API, the docs.deleted count is continuously increasing. Does this mean the documents are automatically getting deleted, if so what did i do wrong ?

PUT /inc_index/
{
  "mappings": {
    "store": {
      "properties": {
        "title": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "store" : true,
          "index_analyzer" : "fulltext_analyzer"
         },
         "description": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "store" : true,
          "index_analyzer" : "fulltext_analyzer"
        },
        "category": {
          "type": "string"
        }
      }
    }
  },
  "settings" : {
    "index" : {
      "number_of_shards" : 5,
      "number_of_replicas" : 1
    },
    "analysis": {
      "analyzer": {
        "fulltext_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "type_as_payload"
          ]
        }
      }
    }
  }
}

The output of "GET /_cat/indices?v" is as shown below, the "docs.deleted" is continuously increasing:

health status index    pri rep docs.count docs.deleted store.size pri.store.size  
green  open   inc_index  5   1   2009877       584438      6.8gb          3.6gb
1
  • Was this update request sent in bulk which caused the increase in deleted document? I am facing a similar issue without issuing any update request. Please let me know how you solved this if you did. Thanks Commented Aug 8, 2017 at 11:56

3 Answers 3

11

If your bulk operations also include updates to existing documents (insert/update to documents with the same ID), then this is normal. In Elasticsearch, an update is a combo of delete+insert operations: https://www.elastic.co/guide/en/elasticsearch/guide/current/update-doc.html

And the deleted documents you see there are documents marked as deleted. When the Lucene segments merging happens, the deleted documents will be physically removed from disk.

Sign up to request clarification or add additional context in comments.

1 Comment

What if this is happening to a freshly created index containing no prior documents?
2

ElasticSearch indexes have been composed of “segments”. Since segments have a policy of "write once", when we delete/update any document from ElasticSearch, it is not actually deleted, only marked as deleted and increases the count in "doc.deleted".

The more segments means slower searches and more memory used. Elasticsearch solves this problem by merging segments in the background. Small segments are merged into bigger segments, which, in turn, are merged into even bigger segments...while merging those segments if there are any documents which are marked as deleted, it doesn't copy that doc in the bigger segment. And Once merging has finished, the old segments are deleted. That's why there is further decrease in "doc.deleted" value.

Comments

2

This can happen if your machine is too slow

If it's too slow handling the (bulk)insertion, for example when your documents are pretty big or if there are just too many of them at once.

After slowing down the indexing process there was no document loss anymore - still strange why the documents not being inserted where listed under "deleted" which seems to me as they where indeed processed.

This occured to me using Elasticdump and could be resolved by setting the --limit option to a lower number.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.