1

I am stuck on one of my tasks.

Overview:

  • There are some records on elastic search. Which includes information about the candidates and their employment.
  • There is a field that stores information about the statuses in which the candidate got submitted.
 {
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.

"query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "submittedJobs",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "submittedJobs.status.keyword": "PendingPM"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }

I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.

here is the mapping

"submittedJobs": {
    "type": "nested",
    "properties": {
        "statusId": {
            "type": "long"
        },
        "status": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256,
                    "normalizer": "lowercase_normalizer"
                }
            }
        },
        "jobId": {
            "type": "keyword"
        }
    }
}

For example. let's suppose there are two documents

document #1:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
},

document #2:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.

Document #1 will be filtered-out since it includes mixed statuses.

Any help will be appreciated.

3
  • can you please put index mapping as well ? what is type of submittedJobs field ? is it object or nested ? Commented Jul 21, 2022 at 7:47
  • @SagarPatel added to the description Commented Jul 21, 2022 at 7:52
  • You can use inner hits as answer by ESCoder. Commented Jul 21, 2022 at 8:00

2 Answers 2

1

Try this:

Will be return only document with all item of array with status PendingPM.

{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
            "path": "submittedJobs",
            "query": {
              "bool": {
                "must_not": [
                  {
                    "match": {
                      "submittedJobs.status": {
                        "query": "PendingPM"
                      }
                    }
                  },
                  {
                    "match": {
                      "submittedJobs.status": {
                        "query": "PendingClient"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Amazing @rabbitbr I am getting the relevant results. Thanks a lot. Just one more thing what if I want to have multiple statuses to check. i.e. if I want a resultset that only contains "PendingPM", "PendingClient" etc.
Great!! So, only you add new criteria in must_not. I updated answer.
Hey @rabbitbr. Thanks for the help. I appreciated it. One last thing. What if I want to add another condition in the above query. i.e. If any other statuses except "PendingPM" and "PendingClient" are present in the array then check that it must be at least 90 days old. let's assume there is a field in the array statusDate. so any statuses other than "PendingPM" and "PendingClient" must be 90 days old.
0

You can use inner_hits along with nested query to get only the matched results from the document

Adding a working example

Index Mapping:

{
    "mappings": {
        "properties": {
            "submittedJobs": {
                "type": "nested"
            }
        }
    }
}

Search Query:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "nested": {
                        "path": "submittedJobs",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "term": {
                                            "submittedJobs.status.keyword": "PendingPM"
                                        }
                                    }
                                ]
                            }
                        },
                        "inner_hits": {}
                    }
                }
            ]
        }
    }
}

Search Result would be:

"hits": [
            {
                "_index": "73062439",
                "_id": "1",
                "_score": 0.0,
                "_source": {
                    "submittedJobs": [
                        {
                            "status": "PendingPM",
                            "jobId": "ABC"
                        },
                        {
                            "status": "PendingClient",
                            "jobId": "XYZ"
                        },
                        {
                            "status": "PendingPM",
                            "jobId": "WXY"
                        }
                    ]
                },
                "inner_hits": {                         // note this
                    "submittedJobs": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 0.4700036,
                            "hits": [
                                {
                                    "_index": "73062439",
                                    "_id": "1",
                                    "_nested": {
                                        "field": "submittedJobs",
                                        "offset": 0
                                    },
                                    "_score": 0.4700036,
                                    "_source": {
                                        "jobId": "ABC",
                                        "status": "PendingPM"
                                    }
                                },
                                {
                                    "_index": "73062439",
                                    "_id": "1",
                                    "_nested": {
                                        "field": "submittedJobs",
                                        "offset": 2
                                    },
                                    "_score": 0.4700036,
                                    "_source": {
                                        "jobId": "WXY",
                                        "status": "PendingPM"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]

5 Comments

Hey, thanks for the reply @ESCoder. It's still returning mixed results. I added this to the query. "inner_hits": { "_source": false, "docvalue_fields": [ "submittedJobs.status.keyword" ] }
@MurtazaMultan I have added a working example above in the answer, using the same sample index data. You can see in the search response inside inner_hits. you will get only the matched results. Let me know if still you are facing the issue.
oh let me explore a little bit more
Ah, now I got it. this is like selecting sub-fields from the result set. But I think I was not able to demonstrate my problem. I want to filter out all the records that have mixed results. I just wanted the records that only contain PendingPM in the entire array. @ESCoder
I have added the example to the description for better understanding

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.