0

My Elasticsearch index has almost 700.000 social media messages posted in 25 different groups. Each message is a JSON and contains the chat.id key.

I need to build a query to use in my Python script in order to print the chat.id values only one time.

To put it simply, my script should output the groups in my database. If I participate in 25 groups, I expect to see 25 chat.id printed.

Currently, I am getting the list by reading each social media message and extracting each message's chat.id value. But as the number of indexed posts grows, it gets longer, time consuming, and also demanding in terms of CPU.

I couldn't find how to build a query to achieve this result simultaneously.

The structure of my docs is like this:

    {
      "_index": "indexname",
      "_type": "_doc",
      "_source": {
        "id": 372353,
        "audio": {},
        "author_signature": null,
        "caption": null,
        "channel_chat_created": null,
        "chat": {
           "id": 1011449296138,
           "type": "supergroup",
           "username": null,
          "first_name": null,
          "title": "chatname"

So far, the query I used is this:

    query= {
      "aggs": {
        "chatids": {
          "terms": {
            "field": "chat.id"
          }
        }
     }
    }
2
  • can you post your current structure of index as well as what query you have tried ? Commented Jul 19, 2019 at 10:55
  • Possible duplicate of ElasticSearch - Return Unique Values Commented Jul 19, 2019 at 11:16

1 Answer 1

0

You can use a terms aggregation to get distinct values. For example:

GET messages/_search
{
 "size":"0",
 "aggs" : {
  "group_ids" : {
   "terms" : { "field" : "group_id", "size" : 1000 }
   }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. I edited the question adding my current query. I tried yours, but ended up iterating all the documents in order to extract single chat.id values. Furthermore, the query you suggest seems quite similar to mine. It takes really a long time to iterate all the dataset.
No solution so far :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.