I have the following table
event_name | score | date | flag |
event_1 | 123 | 12APR2018 | 0 |
event_1 | 34 | 05JUN2019 | 0 |
event_1 | 198 | 08APR2020 | 0 |
event_2 | 3 | 14SEP2019 | 0 |
event_2 | 34 | 22DEC2019 | 1 |
event_2 | 90 | 17FEB2020 | 0 |
event_3 | 772 | 19MAR2021 | 1 |
And I want to obtain
event_name | sum_score | date_flag_1 |
event_1 | 355 | |
event_2 | 127 | 22DEC2019 |
event_3 | 772 | 19MAR2021 |
where sum_score is the sum of column score for the corresponding event and date_flag_1 is the first date when flag = 1 for the corresponding event. If flag = 0 for all the rows of the current event, date_flag_1 should be missing
I suppose that the code should look something like
df_agg = df.groupby('event_name').agg({'score': 'sum', ['date', 'flag']: my_custom_function})
df_agg.columns = ['event_name', 'sum_score', 'date_flag_1']
However, I am not sure how should I implement my_custom_function, which would be a custom aggregation function that uses two columns instead of one (like other aggregation function). Please help