Unique term within bucket in elasticsearch

Question

I have data with mapping and example as follow :

{ 
   date : 'yyyy-mm-dd',
   action : 'click',
   userId : 'not_analysed id in this field',
   user : {
     name : 'John',
     age : '28',
     email : '[email protected]',
     country : 'US'
   }
}

I would have millions of record like this which also have duplication as it is the user activity logs and I would like to group them based on unique userId in date histogram using the date column. It is very simple to use cardinality to get the unique count based on the date histogram.

If i want to get the end result based on unique user bucket and group the user field and get their counts based on their profile as follow. Let say at the month of Jan, we have about 10,000 activity but only 1,000 unique user and based on these users, we would want to get the user field data to see the demographic. Meaning 10,000 records and if perform cardinality on the userId, i would have 1,000 records. Based on this 1,000 records, I need to have results as follow. How to consolidate from 10,000 records to distinct 1,000 records and from those records make it to the answers as below.

Expected end results:
{
    '2016-01-01',
    aggs: {
        [{
            age: 28,
            count: 100
        }, {
            age: 27,
            count: 500
        }, {
            country: 'US',
            count: 200
        }, {
            country: 'Canada',
            count: 200
        }]
    },
    '2016-02-01',
    aggs: {
        [{
            age: 29,
            count: 200
        }, {
            age: 31,
            count: 1000
        }, {
            country: 'Mexico',
            count: 400
        }, {
            country: 'UK',
            count: 400
        }]
    }

In conclusion, is there any general way to compute and then this results by using terms or even using pipe aggregations?

Please help out.

did my solution below solved your problem?

Sumit
– Sumit

2016-08-09 12:29:15 +00:00
Commented Aug 9, 2016 at 12:29 — Sumit
– Sumit, Commented Aug 9, 2016 at 12:29

Sumit · Accepted Answer · 2016-07-30 12:43:30Z

0

What you need is 3 different sub-aggregations under main date-histogram aggregations. Your query will look similar to this.

The query is composed of terms aggregation for age and country data and cardinality for count of number of unique users.

You can increase the size of each terms aggregation to get your desired result.

{
  "aggs": {
    "user_data_over_time": {
      "date_histogram": {
        "field": "date",
        "interval": "day", 
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "unique_users": {
          "cardinality": {
            "field": "userId"
          }
        },
        "age_data":{
          "terms": {
            "field": "user.age",
            "size": 10
          }
        },
        "country_data":{
          "terms": {
            "field": "user.country",
            "size": 10
          }
        }
      }
    }
  }
}

answered Jul 30, 2016 at 12:43

Sumit

2,31026 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nick Over a year ago

yes, this would get the count but the user.age is based on the total records but not on the number of distinct userId.

Sumit Over a year ago

@Nick so your userId and user are not related and you would need age on each userId? can you clarify your requirement of output as based on example you provided 1000 are unique users out of which 100 are of age 28 and 500 are of age 27. If this is the requirement then above query will definitely work.

Nick Over a year ago

let say the total records are 10,000 with multiple duplicate userId across this 10,000. Then, the distinct number of userId is 1,000. Based on this distinct 1,000 users, i will need the age and country grouping as well. So, the age grouping will only based on the 1,000 but actualy records are 10,000.

Nick Over a year ago

I am sorry for the confusion again. To clearify, I would need the number of distinct age group and also country group. Meaning the number of cardinality is the total number that i need to have for the age group and also the country group combine. Let say i have 1,000 of unique users based on cardinality . Then based on those figure 1,000, i need to have total of 1,000 combine for the age and country group. Hope this clearify.

Sumit Over a year ago

The above query would work in your case, have you tried it?

Collectives™ on Stack Overflow

Unique term within bucket in elasticsearch

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related