1

We have a series of status updates for projects and the last update for a given project is the one we want to report on in several ways. For instance:

ProjectID DateTime EventDescription
001 2024-12-07 11:34 New
001 2024-12-07 11:36 Submitted
002 2024-12-07 11:40 New
003 2024-12-07 12:34 New
001 2024-12-07 14:02 Approved
002 2024-12-07 14:55 Submitted
004 2024-12-07 15:02 New
004 2024-12-07 15:44 Submitted
001 2024-12-07 16:03 Completed

In our actual data, there are, of course, thousands of projects and many more status updates.

THE GOAL: We have to use an aggregation to grab the last status (the current status) for each project, but we want a summary of the project status and count of each in a datetime range.

For the data above, we want to get:

Status Project Count
New 1
Submitted 2
Completed 1

We are looking for a means to do this in a single query. We have several places where we need this. This is just one example, and using a transformation is not a viable option at this time.

In addition to simple counts, we next hope to figure out how to aggregate these status updates into bucket counts by day to show a status graph across a series of days. How many each day are New, Submitted, etc... But we would be thrilled just to get status counts accurately.

We believe this requires a pipeline aggregation, but have not been able to get it working.

Our working aggregation query to get the latest project status for project:

GET journaling*/_search
{
  "query": {
    "bool": {
      "filter": [
        { "range": {
          "DATETIME": {
            "gte":"2024/11/01 00:00:00.000",
            "lte":"2024/11/30 23:59:59.000"
          }
        }},
        {
          "match": {
            "ACCOUNT": "12345"
          }
        }
      ]      
    }
  },
  "size": 0,
  "aggs": {
    "ProjectStatusSummary": {
      "terms": {
        "field": "PROJECTID"
      },
      "aggs": {
        "group": {
          "top_hits": {
            "size": "1",
            "_source": {
              "includes": [
                "DATETIME",
                "PROJECTID",
                "EVENTDESCRIPTION",
                "PROJECTSTART"
              ]
            },
            "sort": {
              "DATETIME": {
                "order": "desc"
              }
            }
          }
        }
      }
    }
  }
}
6
  • I have learned a bit more. I explored 2 options. I tried to aggregate on this aggregation and received an error from Elastic because it does not like sub-aggregations on a top_hits. I also tried collapsing as a different means to get the latest status, which works great, but I do not yet see a way to aggregate on the output of a collapse either. Commented Dec 18, 2024 at 19:13
  • Continuing to learn. I have tried a "max" aggregation for the DATETIME field, and it does return the correct records, but I have not yet been able to get the fields for the selected record, just the document key and the actual datetime value. This max aggregation could work if I could figure out how to get the other fields for the max document. Commented Dec 19, 2024 at 13:57
  • And I learned max aggregations, just like top_hits, cannot accept sub-aggregations. Commented Dec 19, 2024 at 14:10
  • I would think wanting to work on a set of most recent records for each order, customer or whatever would be very useful, but so far not possible. If you collapse to get most recent, then I do not see a way to aggregate on the collapse results. If you use a top_hits aggregation, then you cannot sub-aggregate on the results. If you use a max aggregation, then you cannot sub-aggregate on the results or see the other fields in the max record found. Commented Dec 19, 2024 at 14:23
  • I believe using named pipelines is my best and final option to achieve what I need, but I have not figured out a named pipeline aggregation yet that fetches records. All the examples I have seen are about doing math and ending up with a numeric result. Commented Dec 19, 2024 at 15:48

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.