3

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date. I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket. This means that a single document might be in multiple buckets.

Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.

Sample doc content:

{
   "start_date": "2013-01-12T00:00:00.000Z",
   "end_date": "2016-12-08T00:00:00.000Z",
   "id": "123123123"
}

Is there a way to do this in ES?

5
  • always support your questions with proper json to show example to usecases. thx Commented Aug 2, 2017 at 9:43
  • Added one, although it doesn't feel like it's adding a lot of value Commented Aug 2, 2017 at 9:47
  • 1
    its not about value, its about when i read the post i should be able to visualize your problem better and obviously pushing a formatted json does't hurt much. Commented Aug 2, 2017 at 13:08
  • I doubt that one document can appear in multiple buckets of one aggregation. I can suggest checking out scripting for date aggregation, using pipeline aggregations and to denormalize data, e.g. precompute all the weeks between start_date and end_date and store them as a list in a field active_weeks. Commented Aug 3, 2017 at 10:19
  • @NikolayVasiliev I have found one way to do that (check my answer), but I am still not happy about this. Perhaps scripting is a better way forward Commented Aug 3, 2017 at 10:30

2 Answers 2

3

I have found one way to do this, using filter aggregations ( https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:

"bool":{
 "must":[{
  "range":{
   "start_date":{
    "lte":"2016-01-01T00:00:00.000Z"
   }
  }
 },{
 {
  "range":{
   "end_date":{
    "gt":"2016-02-01T00:00:00.000Z"
   }
  }
 }]
}

However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.

Sign up to request clarification or add additional context in comments.

Comments

-1

I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".

Here are the documentation links for both of these aggregations.

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

2 Comments

Please add summery of your links content to your answer
The links should be able to describe you everything. And if you want me to provide some sample query around it then I would need the expected result format.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.