2

I'm going to perform a count aggregation on an array field by filtering the values in a range. For example, I have following 3 documents and I want to find out value counts for purchase_date_list that are between 20210101 to now(). The expected result (purchase count between 20210101 - now()) will be that: customer_id: 1, purchase count is: 2 customer_id: 2, purchase count is: 0 customer_id: 3, purchase count is: 1

Can anybody please help with some ideas on how to compose a aggregation query for above request?

Many thanks!

{
  customer_id: 1,
  purchase_date_list: [
    20050101,
    20210304,
    20211121
  ]
},
{
  customer_id: 2,
  purchase_date_list: [
    20100301
  ]
},
{
  customer_id: 3,
  purchase_date_list: [
    20210701
  ]
}
2
  • looking forward for feedbacks! Commented Nov 29, 2021 at 13:35
  • Hi @jiajun, I believe you need to use elastic script language painless to perform this kind of action. Commented Dec 7, 2021 at 14:51

1 Answer 1

1

Following on my comment, it solved it using painless. (As I still am not sure how do deal with it using aggregation)

Here is the documentation that helped me solve this issue. [doc]

Set up

PUT /so_agg_test/

POST /so_agg_test/_doc
{
  "customer_id": 1,
  "purchase_date_list": [
    20050101,
    20210304,
    20211121
  ]
}

POST /so_agg_test/_doc
{
  "customer_id": 2,
  "purchase_date_list": [
    20100301
  ]
}

POST /so_agg_test/_doc
{
  "customer_id": 3,
  "purchase_date_list": [
    20210701
  ]
}

GET /so_agg_test/_search

Solution

This query will create a new field, named number_of_sales_interval in you hits.

GET /so_agg_test/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "number_of_sales_interval": {
      "script": {
        "lang": "painless",
        "params": {
          "lower_bound": 20210101
        }, 
        "source": """
        def dates = doc['purchase_date_list'];
        
        def number_of_sales_interval = 0;
        for(date in dates){
          if(date > params.lower_bound){
            number_of_sales_interval += 1;
          }
        }
        
        return number_of_sales_interval;
        """
      }
    }
  }
}

Results

You should have something along those lines.

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "so_agg_test",
        "_type" : "_doc",
        "_id" : "UOVplX0B0iK523s0yaCu",
        "_score" : 1.0,
        "fields" : {
          "number_of_sales_interval" : [
            2
          ]
        }
      },
      {
        "_index" : "so_agg_test",
        "_type" : "_doc",
        "_id" : "UeVplX0B0iK523s01KC-",
        "_score" : 1.0,
        "fields" : {
          "number_of_sales_interval" : [
            0
          ]
        }
      },
      {
        "_index" : "so_agg_test",
        "_type" : "_doc",
        "_id" : "UuVplX0B0iK523s04KAT",
        "_score" : 1.0,
        "fields" : {
          "number_of_sales_interval" : [
            1
          ]
        }
      }
    ]
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot @Paulo! I created a similar query using runtime_mapping, would you know what's the difference between runtime_mapping and scrip_fields? Should I use script_fields or runtime_mapping?
As I understand runtime_mapping is the way to go now! discuss.elastic.co/t/scripted-fields-vs-runtime-fields/263962 I have not used them myself so I wouldn't know from real life use case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.