0

I have the following data in my Elasticsearch index:

{
  "title": "Hello from elastic",
  "name": "ABC",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "PQR",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "XYZ",
  "j_id": "2",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "MNO",
  "j_id": "3",
  "date": '2021-03-02T12:29:31.356514'
}

Now I want to get unique records on the basis of the id.

The expected output is:

{
    "1": [{
      "title": "Hello from elastic",
      "name": "ABC",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    },
    {
      "title": "Hello from elastic",
      "name": "PQR",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "2": [{
      "title": "Hello from elastic",
      "name": "XYZ",
      "j_id": "2",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "3": [{
      "title": "Hello from elastic",
      "name": "MNO",
      "j_id": "3",
      "date": '2021-03-02T12:29:31.356514'
    }]
  }

I tried an aggregate query but it's giving me only the counts. Also, I want to include latest record in my response.

  1. How can I get sorted, unique records from Elasticsearch grouped by the id?
  2. I want latest inserted data first

1 Answer 1

1

Assuming a minimal mapping covering the date and j_id fields:

PUT myindex
{
  "mappings": {
    "properties": {
      "j_id": {
        "type": "keyword"
      },
      "date": {
        "type": "date"
      }
    }
  }
}

you can leverage a terms aggregation whose sub-aggregation is an ordered top_hits aggregation:

POST myindex/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.sorted_hits.hits.hits._source
{
  "size": 0,
  "aggs": {
    "by_j_id": {
      "terms": {
        "field": "j_id",
        "size": 10,
        "order": {
          "max_date": "desc"
        }
      },
      "aggs": {
        "max_date": {
          "max": {
            "field": "date"
          }
        },
        "sorted_hits": {
          "top_hits": {
            "size": 10,
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

The URL parameter filter_path reduces the response body to closely mimic your required format:

{
  "aggregations" : {
    "by_j_id" : {
      "buckets" : [
        {
          "key" : "1",        
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "ABC",
                    "j_id" : "1",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                },
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "PQR",
                    "j_id" : "1",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "2",
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "XYZ",
                    "j_id" : "2",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "3",
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "MNO",
                    "j_id" : "3",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Hello Joe, This solution working for me but I want latest inserted data first.
OK that wasn't mentioned in the question ;) I've updated my answer -- the trick is to add a max aggregation on the date field and sort the terms with it. Hope it helps.
Can I add pagination in above query?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.