My Elasticsearch index has almost 700.000 social media messages posted in 25 different groups. Each message is a JSON and contains the chat.id key.
I need to build a query to use in my Python script in order to print the chat.id values only one time.
To put it simply, my script should output the groups in my database. If I participate in 25 groups, I expect to see 25 chat.id printed.
Currently, I am getting the list by reading each social media message and extracting each message's chat.id value. But as the number of indexed posts grows, it gets longer, time consuming, and also demanding in terms of CPU.
I couldn't find how to build a query to achieve this result simultaneously.
The structure of my docs is like this:
{
"_index": "indexname",
"_type": "_doc",
"_source": {
"id": 372353,
"audio": {},
"author_signature": null,
"caption": null,
"channel_chat_created": null,
"chat": {
"id": 1011449296138,
"type": "supergroup",
"username": null,
"first_name": null,
"title": "chatname"
So far, the query I used is this:
query= {
"aggs": {
"chatids": {
"terms": {
"field": "chat.id"
}
}
}
}