I have the following dataframe. I would like to get the rows where the date is max for each pipeline_name
Here is the dataframe:
+----+-----------------+--------------------------------------+----------------------------------+
| | pipeline_name | runid | run_end_dt |
|----+-----------------+--------------------------------------+----------------------------------|
| 0 | test_pipeline | test_pipeline_run_101 | 2021-03-10 20:01:26.704265+00:00 |
| 1 | test_pipeline | test_pipeline_run_102 | 2021-03-13 20:08:31.929038+00:00 |
| 2 | test_pipeline2 | test_pipeline2_run_101 | 2021-03-10 20:13:53.083525+00:00 |
| 3 | test_pipeline2 | test_pipeline2_run_102 | 2021-03-12 20:14:51.757058+00:00 |
| 4 | test_pipeline2 | test_pipeline2_run_103 | 2021-03-13 20:17:00.285573+00:00 |
Here is the result I want to achieve:
+----+-----------------+--------------------------------------+----------------------------------+
| | pipeline_name | runid | run_end_dt |
|----+-----------------+--------------------------------------+----------------------------------|
| 0 | test_pipeline | test_pipeline_run_102 | 2021-03-13 20:08:31.929038+00:00 |
| 1 | test_pipeline2 | test_pipeline2_run_103 | 2021-03-13 20:17:00.285573+00:00 |
In the expected result, we have only the runid against each pipeline_name with the max run_end_dt
Thanks