-1

My data objects contain 3 string fields: section, category and username. I want to do grouping by category and username and find out the top 20 "section" and their count in the grouping i did

basically the result i want is like the sql query: select section, count(*) from table group by category, username order by 2 desc limit 20

i tried doing aggregation on section, category and username. The result was the grouping i was looking for, but it didn't sort the "section" field by this grouping (e.g i need the first result to contain the section that appears in the most combinations of category and username and its count)

instead, it sorted the section just by its doc_count and inside of each section, it sorted "category" by its doc_count and inside of it, it sorted each "username" by its doc_count

This is what I tried:

GET some_index/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "some_filter_i_needed": "filter_value"
          }
        }
      ]
    }
  },
  "aggs": {
    "top_section": {
      "terms": {
        "field": "section.keyword",  
        "size": 20,  
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "top_category": {
          "terms": {
            "field": "category.keyword",
            "size": 1,
            "order": {
              "_count": "desc"
            }
          },
          "aggs": {
            "top_username": {
              "terms": {
                "field": "username.keyword",
                "size": 1 ,
                "order": {
                  "_count": "desc"
                }
              }
            }
          }
        }
      }
    }
  }
}

Is there some way to get the output i'm looking for?

Thanks

1 Answer 1

1

As far as I understand from your question, you need to first group by "category" and "username", and then find the top "section" within each group. The query you provided is doing the opposite, it's first grouping by "section", then "category", and then "username".

Please check the following example:

POST some_index/_bulk
{ "index" : { "_index" : "some_index", "_id" : "1" } }
{ "section" : "section1", "category" : "category1", "username" : "username1" }
{ "index" : { "_index" : "some_index", "_id" : "2" } }
{ "section" : "section2", "category" : "category1", "username" : "username1" }
{ "index" : { "_index" : "some_index", "_id" : "3" } }
{ "section" : "section1", "category" : "category2", "username" : "username2" }
{ "index" : { "_index" : "some_index", "_id" : "4" } }
{ "section" : "section2", "category" : "category2", "username" : "username2" }

GET some_index/_search
{
  "size": 0,
  "aggs": {
    "top_category_username": {
      "terms": {
        "script": {
          "source": "doc['category.keyword'].value + ' ' + doc['username.keyword'].value",
          "lang": "painless"
        },
        "size": 20,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "top_section": {
          "terms": {
            "field": "section.keyword",
            "size": 1,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
  }
}

enter image description here

update-1

GET some_index/_search
{
  "size": 0,
  "aggs": {
    "sections": {
      "terms": {
        "field": "section.keyword",
        "size": 10
      },
      "aggs": {
        "category_user_combinations": {
          "terms": {
            "script": {
              "source": "doc['category.keyword'].value + ' ' + doc['username.keyword'].value",
              "lang": "painless"
            },
            "size": 20,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
  }
}

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

not exactly, what i want is that the first result will contain the section which had a combination of username and category which is the greatest count. for example if section1 had 10 occurrences of username A in category cat1. and section 2 had 7 occurrences of username B in category cat2 and 9 of username C in category cat2, then the first result should be section1 because one of its username and category combination had the greatest count among the other combinations in other sections, although the sum of the combination in section2 was more than the sum of combinations of section1
Thanks for clarifying. I updated the answer please check one more time. Also please check composite aggs, maybe it works too. elastic.co/guide/en/elasticsearch/reference/current/…
Thanks! I managed to improvise something using something you wrote. I just put the asset, username and section together in the script and made it the only aggregation
You're welcome happy to hear it worked.
You can accept as an answer if it's fixed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.