1

I want to bulk update the documents matching the criteria.
ES version : 5.1.1
Index: index_1234
Type: addresses
URL: POST http://localhost:9200/index_1234/addresses/_update_by_query
Payload:

{
    "id":1,
    "address":"temp address"
}

I am using following inline script to update the documents

{
  "script": {
     "inline": "if(ctx._source.containsKey(\"address\") && ctx._source.address == "temp address"){ctx._source.address='perm address'}"
  }
}

i.e If "address" field has value as "temp address" , I am replacing it with "perm address"

This script works perfectly and updates the matching documents only. However I have a doubt

Suppose there are total 10 documents out of which 5 have "address" field as "temp address" and 5 have "address" field as "perm address"

On executing above script it gives following o/p

{
    "took": 131,
    "timed_out": false,
    "total": 10,
    **"updated": 10**,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}

so although it updated only 5 documents (I have double checked), in final response it says "updated": 10. I was expecting "updated":5 , am I missing something ? is it updating anything else ? I can see "_version" getting updated for all the documents i.e. even for documents which don't have matching string.
Thanks in advance !

UPDATE:

Thanks a lot Mike for quick response :)
According to Mike field check in the update query should be updated as follows.

"query": {
  "exists": {
    "field": "address"
  }
}

However there is 1 more issue.
Initially I had kept the fields to minimum for simplicity of the question. Actually there are many fields in the payload and I want to update 3 of them based on some condition.

So , the payload is as follows

{
    "id":12,
    "address":"temp address",
    "email":"temp email",
    "phone":"temp phone",
    .
    .
    .
}

And I am using following script to update all 3 fields

{
  "script": {
     "inline": "if(ctx._source.containsKey(\"address\") && ctx._source.address == "temp address"){ctx._source.address='perm address'}if(ctx._source.containsKey(\"email\") && ctx._source.email == "temp email"){ctx._source.email='perm email'}if(ctx._source.containsKey(\"phone\") && ctx._source.phone == "temp phone"){ctx._source.phone='perm phone'}"
  }
}

can we update the Mike's solution for multiple fields ?? or Is there any alternative way we can do this ? Thanks again !

1 Answer 1

2

This is happening because even though you're not actually modifying all the documents you retrieve in the _update_by_query request, they are at least being counted as hits in your query.

Instead of doing your field check in the update query, add this exist query to your update request:

"query": {
  "exists": {
    "field": "address"
  }
}

Editing to respond to the updated question: If you want to update 3 fields, you're most likely going to have the easiest time just doing 3 simultaneous modifications via the single script. As shown above, the count isn't really reflective of the actual modifications being done to the documents. If the count is absolutely required, you could do this (and it would filter to any document that has any one of these 3.)

"query": {
  "bool": {
    "minimum_should_match": 1,
    "should": [
      {
        "term": {
          "address": {
            "value": "temp address"
          }
        }
      },
      {
        "term": {
          "email": {
            "value": "temp email"
          }
        }
      },
      {
      "term": {
        "phone": {
          "value": "temp phone"
        }
      }
    }
  ]
  }
  }
Sign up to request clarification or add additional context in comments.

3 Comments

Hi Mike , Thanks for very quick response :) , I have updated the question now , can u please have a look at "Update" section. ??
Thanks Mike ! But I believe the inline script will execute if and only if all 3 conditions are matched , there could be cases where in only 1 or 2 or all 3 conditions are matched .
@SSG It should match for any one of them. I wrapped the term queries in a should container at the top. Since the "minimum_should_match" is 1 (which is default, I configured it for demonstration), any document that is true to any one of those statements will be part of the result set. Then those results will be accessible to your update script.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.