Elasticsearch delete By Query not completing deletes

Question

I need to delete a large number of documents in a 5.5 Elasticsearch cluster. I know the optimal way to do this is to rebuild the cluster without the intended documents, but that's not possible in our case. I run the following query that deletes documents from a subset of the indexes in the cluster:

GET myindex_1*/doc_type/_delete_by_query
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "typeCode": [
              "Filtered_Type"
            ]
          }
        }
      ],
      "must": [
        {
          "range": {
            "createdDateUTC": {
              "lt": "2017-10-28"
            }
          }
        }
      ]
    }
  }
}

It starts deleting documents for a couple of hours but then just stops and I have to kick it off again. Any ideas why it stops running the delete query?

Just a note, I'm using Kibana to run the query and the request times out on the client side when though I can see it continues deleting on the backend.

isn't it because of timeout? Could you try POST instead of GET? — Mysterion
– Mysterion, Commented Oct 31, 2019 at 17:54

Jim G. · Accepted Answer · 2019-10-31 19:57:52Z

1

From here:

By default _delete_by_query uses scroll batches of 1000. You can change the batch size with the scroll_size URL parameter:

POST twitter/_delete_by_query?scroll_size=5000
{
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}

You can find more information here about batching and batch sizes here:

batches and requests_per_second in ElasticSearch Delete By Query API

And since you'll need to scroll through one to many batches to delete all of the documents found by your query, you can find more information about scrolling here:

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-scroll.html

edited Oct 31, 2019 at 19:57

answered Oct 31, 2019 at 18:13

Jim G.

15.4k23 gold badges109 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CorribView Over a year ago

would a default batch size of 1000 cause it to stop deleting documents eventually? And increasing it to 5000 prevent this from happening?

Jim G. Over a year ago

I updated my answer. Simply increasing the batch size to 5000 isn't sufficient. You need to think in terms of scrolling through all the batches and deleting all of the documents in each batch.

Adam Sheehan · Accepted Answer · 2025-01-30 03:40:02Z

0

The Delete by Query API can halt if it runs into conflicting versions of a document. This can happen if a document was updated after the delete by query started but before it reached the document (Elastic documentation).

If you're running the deletion asynchronously, you can fetch the task details after it completes to see if there were any failures (Task API docs).

You can also specify the conflicts=proceed query parameter which will not halt the deletion if a conflict is detected. I'm not sure if that conflicting doc will still be deleted though.

answered Jan 30 at 3:40

Adam Sheehan

2,18224 silver badges19 bronze badges

Collectives™ on Stack Overflow

Elasticsearch delete By Query not completing deletes

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related