I have docs that looks like this in ES:
{
"ipAddress": "w",
...
},
{
"ipAddress": "x",
...
},
{
"ipAddress": "x",
...
},
{
"ipAddress": "x",
...
},
{
"ipAddress": "y",
...
},
{
"ipAddress": "y",
...
},
{
"ipAddress": "z",
...
},
...
I'm looking to understand the distribution of the number of events per ip address. In other words, for the data above, i want the results:
numOfEvents | numOfOccurences |
_______________________________
1 | 2 (2 times, w and z)
_______________________________
2 | 1 (1 time, y)
_______________________________
3 | 1 (1 time, x)
_______________________________
The cardinality of ipAddress is very large, so it's infeasible to get a terms aggregation by ip and build the histogram on the client side. I saw this question which seems to indicate that this isn't possible? A transform table is the only option? We're on an older version of ES that we need to upgrade to use this, and I would imagine using this transform approach would use up a lot of space (currently about 2TB of data).
IP range aggselastic.co/guide/en/elasticsearch/reference/current/…so it's infeasible to get a terms aggregationcan you clarify why? + also, what do you think about filtering aggregation with buckets? for example numOfEvent-2023, numOfEvents-2022