1

I have data with mapping and example as follow :

{ 
   date : 'yyyy-mm-dd',
   action : 'click',
   userId : 'not_analysed id in this field',
   user : {
     name : 'John',
     age : '28',
     email : '[email protected]',
     country : 'US'
   }
}

I would have millions of record like this which also have duplication as it is the user activity logs and I would like to group them based on unique userId in date histogram using the date column. It is very simple to use cardinality to get the unique count based on the date histogram.

If i want to get the end result based on unique user bucket and group the user field and get their counts based on their profile as follow. Let say at the month of Jan, we have about 10,000 activity but only 1,000 unique user and based on these users, we would want to get the user field data to see the demographic. Meaning 10,000 records and if perform cardinality on the userId, i would have 1,000 records. Based on this 1,000 records, I need to have results as follow. How to consolidate from 10,000 records to distinct 1,000 records and from those records make it to the answers as below.

Expected end results:
{
    '2016-01-01',
    aggs: {
        [{
            age: 28,
            count: 100
        }, {
            age: 27,
            count: 500
        }, {
            country: 'US',
            count: 200
        }, {
            country: 'Canada',
            count: 200
        }]
    },
    '2016-02-01',
    aggs: {
        [{
            age: 29,
            count: 200
        }, {
            age: 31,
            count: 1000
        }, {
            country: 'Mexico',
            count: 400
        }, {
            country: 'UK',
            count: 400
        }]
    }

In conclusion, is there any general way to compute and then this results by using terms or even using pipe aggregations?

Please help out.

1
  • did my solution below solved your problem? Commented Aug 9, 2016 at 12:29

1 Answer 1

0

What you need is 3 different sub-aggregations under main date-histogram aggregations. Your query will look similar to this.

The query is composed of terms aggregation for age and country data and cardinality for count of number of unique users.

You can increase the size of each terms aggregation to get your desired result.

{
  "aggs": {
    "user_data_over_time": {
      "date_histogram": {
        "field": "date",
        "interval": "day", 
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "unique_users": {
          "cardinality": {
            "field": "userId"
          }
        },
        "age_data":{
          "terms": {
            "field": "user.age",
            "size": 10
          }
        },
        "country_data":{
          "terms": {
            "field": "user.country",
            "size": 10
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

5 Comments

yes, this would get the count but the user.age is based on the total records but not on the number of distinct userId.
@Nick so your userId and user are not related and you would need age on each userId? can you clarify your requirement of output as based on example you provided 1000 are unique users out of which 100 are of age 28 and 500 are of age 27. If this is the requirement then above query will definitely work.
let say the total records are 10,000 with multiple duplicate userId across this 10,000. Then, the distinct number of userId is 1,000. Based on this distinct 1,000 users, i will need the age and country grouping as well. So, the age grouping will only based on the 1,000 but actualy records are 10,000.
I am sorry for the confusion again. To clearify, I would need the number of distinct age group and also country group. Meaning the number of cardinality is the total number that i need to have for the age group and also the country group combine. Let say i have 1,000 of unique users based on cardinality . Then based on those figure 1,000, i need to have total of 1,000 combine for the age and country group. Hope this clearify.
The above query would work in your case, have you tried it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.