I have data with mapping and example as follow :
{
date : 'yyyy-mm-dd',
action : 'click',
userId : 'not_analysed id in this field',
user : {
name : 'John',
age : '28',
email : '[email protected]',
country : 'US'
}
}
I would have millions of record like this which also have duplication as it is the user activity logs and I would like to group them based on unique userId in date histogram using the date column. It is very simple to use cardinality to get the unique count based on the date histogram.
If i want to get the end result based on unique user bucket and group the user field and get their counts based on their profile as follow. Let say at the month of Jan, we have about 10,000 activity but only 1,000 unique user and based on these users, we would want to get the user field data to see the demographic. Meaning 10,000 records and if perform cardinality on the userId, i would have 1,000 records. Based on this 1,000 records, I need to have results as follow. How to consolidate from 10,000 records to distinct 1,000 records and from those records make it to the answers as below.
Expected end results:
{
'2016-01-01',
aggs: {
[{
age: 28,
count: 100
}, {
age: 27,
count: 500
}, {
country: 'US',
count: 200
}, {
country: 'Canada',
count: 200
}]
},
'2016-02-01',
aggs: {
[{
age: 29,
count: 200
}, {
age: 31,
count: 1000
}, {
country: 'Mexico',
count: 400
}, {
country: 'UK',
count: 400
}]
}
In conclusion, is there any general way to compute and then this results by using terms or even using pipe aggregations?
Please help out.