0

My documents consist of a history of orders and their state, here a minimal example:

    {
      "orderNumber" : "xyz",
      "state" : "shipping",
      "day" : "2022-07-20",
      "timestamp" : "2022-07-20T15:06:44.290Z",
    }

the state can be strings like shipping, processing, redo,...

For every possible state, I need to count the number of orders that had this state at some point during a day, without counting a state twice for the same orderNumber that day (which can happen if there is a problem and it needs to start from the beginning that same day).

My aggregation looks like this:

GET order-history/_search
{
   "aggs": {
      "countDays": {
          "terms": {
              "field": "day",
              "order": {
                "_key": "desc"
              },
              "size": 20
          },
          "aggs": {
            "countStates": {
              "terms": {
                "field": "state.keyword",
                "size": 10
              }
              
            }
          }
      }
   }
   , "size": 1
}

However, this will count a state for a given orderNumber twice if it reappears that same day. How would I prevent it from counting a state twice for each orderNumber, if it is on the same day?

1 Answer 1

1

Tldr;

I don't think there is a flexible and simple solution.

But if you know in advance the number of state that exists. Maybe through another aggregation query, to get all type of state.

You could do the following

POST /_bulk
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"redo","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"processing","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"shipping","day":"2022-07-20"}


GET 73138766/_search
{
  "size": 0,
  "aggs": {
    "per_day": {
      "date_histogram": {
        "field": "day",
        "calendar_interval": "day"
      },
      "aggs": {
        "shipping": {
          "filter": { "term": { "state.keyword": "shipping" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        },
        "processing": {
          "filter": { "term": { "state.keyword": "processing" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        },
        "redo": {
          "filter": { "term": { "state.keyword": "redo" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        }
      }
    }
  }
}

You will obtain the following results

{
  "aggregations": {
    "per_day": {
      "buckets": [
        {
          "key_as_string": "2022-07-20T00:00:00.000Z",
          "key": 1658275200000,
          "doc_count": 5,
          "shipping": {
            "doc_count": 3,
            "orders": {
              "value": 2
            }
          },
          "processing": {
            "doc_count": 1,
            "orders": {
              "value": 1
            }
          },
          "redo": {
            "doc_count": 1,
            "orders": {
              "value": 1
            }
          }
        }
      ]
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Nice! Indeed the number of states is fixed, so your solution would work!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.