0

I have docs that looks like this in ES:

{
  "ipAddress": "w",
  ...
},
{
  "ipAddress": "x",
  ...
},
{
  "ipAddress": "x",
  ...
},
{
  "ipAddress": "x",
  ...
},
{
  "ipAddress": "y",
  ...
},
{
  "ipAddress": "y",
  ...
},
{
  "ipAddress": "z",
  ...
},
...

I'm looking to understand the distribution of the number of events per ip address. In other words, for the data above, i want the results:

numOfEvents | numOfOccurences |
_______________________________
     1      |       2      (2 times, w and z)
_______________________________
     2      |       1      (1 time, y)
_______________________________
     3      |       1      (1 time, x)
_______________________________

The cardinality of ipAddress is very large, so it's infeasible to get a terms aggregation by ip and build the histogram on the client side. I saw this question which seems to indicate that this isn't possible? A transform table is the only option? We're on an older version of ES that we need to upgrade to use this, and I would imagine using this transform approach would use up a lot of space (currently about 2TB of data).

2
  • Hi Mike, what is the type of ipAddress field? If the field type is IP rather than using term aggregation you can use the IP range aggs elastic.co/guide/en/elasticsearch/reference/current/… Commented May 19, 2023 at 15:10
  • so it's infeasible to get a terms aggregation can you clarify why? + also, what do you think about filtering aggregation with buckets? for example numOfEvent-2023, numOfEvents-2022 Commented May 19, 2023 at 15:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.