0

I have a DataFrame with two columns that will be used in group by( GroupBy1 and GroupBy2 ), with dozens of columns that will be measures using agg() ( MesA:Max, MesB:Min, MesC:sum.. ) and other columns that aren't for measures, but for informations like datetime of last row in the groupby, 'GroupName1', 'GroupName2', GroupId of another database, and others.

TicketsDBFrame
GroupBy1 GroupBy2 GroupName1 GroupName2 MesA MesB MesC MesD LastTicketTime       GroupId1 GroupId2
1        1        First      First      2    3    1    6    2021-04-05 01:00:00  4        99
1        1        First      First      4    1    3    2    2021-04-05 02:00:00  4        99
1        1        First      First      2    5    2    1    2021-04-05 03:00:00  4        99
1        2        First      Second     2    5    2    1    2021-04-05 01:30:00  4        75
1        2        First      Second     1    4    7    3    2021-04-05 02:30:00  4        75
2        2        Second     Second     4    2    1    8    2021-04-05 02:00:00  2        75
2        2        Second     Second     1    6    3    1    2021-04-05 04:00:00  2        75

The desired output:

GroupBy1 GroupBy2 GroupName1 GroupName2 MesA MesB MesC MesD LastTicketTime       GroupId1 GroupId2
1        1        First      First      4    1    6    9    2021-04-05 03:00:00  4        99
1        2        First      Second     2    4    4    2    2021-04-05 02:30:00  4        75
2        2        Second     Second     4    2    4    9    2021-04-05 04:00:00  2        75

I already know how to create this desired Frame using derived DataFrames, using 'loc' and 'idxmax' to get LastTicketTime in a frame, other derived frame to 'Ids and Names' and another DataFrame to call agg() for the measure columns, after that I do a merge in the frames

groupInfoFrame:Little Derived Frame created associating GroupBy1 - GroupName1 - GroupId1

lastTicketFrame: Frame with only LastTicketTime

lastTicketFrame=TicketsDBFrame[['GroupBy1','GroupBy2','LastTicketTime' ]]
lastTicketFrame=lastTicketFrame.loc[lastTicketFrame.groupby(['GroupBy1'],['GroupBy2]).LastTicketTime.idxmax() ]

measuresFrame: Only measures

measuresFrame = TicketsDBFrame.groupby(['GroupBy1'],['GroupBy2]).agg( mesA:.....MesD )

After all I do a merge in the measureFrame and lastTicketFrame using the GroupBy1 and GroupBy2 as keys

Is it possible to have all these information in only one agg() or transform() or other function call? Without derived frames and merges

0

1 Answer 1

1

You can do it in a single agg (by sorting values by LastTicketTime and taking last in agg):

(df
    .sort_values('LastTicketTime')
    .groupby(['GroupBy1', 'GroupBy2'], as_index=False)
    .agg({
        'GroupName1': 'last',
        'GroupName2': 'last',
        'MesA': 'max',
        'MesB': 'min',
        'MesC': 'sum',
        'MesD': 'sum',
        'LastTicketTime': 'last',
        'GroupId1': 'last',
        'GroupId2': 'last'
    }))

Output:

   GroupBy1  GroupBy2 GroupName1 GroupName2  MesA  MesB  MesC  MesD  \
0         1         1      First      First     4     1     6     9   
1         1         2      First     Second     2     4     9     4   
2         2         2     Second     Second     4     2     4     9   

        LastTicketTime  GroupId1  GroupId2  
0  2021-04-05 03:00:00         4        99  
1  2021-04-05 02:30:00         4        75  
2  2021-04-05 04:00:00         2        75

P.S. If I'm not mistaken, there seems to be a problem in your expected output for GroupName1 = First, GroupName2 = Second: MesC and MesD are sums, and should be 9 and 4, respectively (instead of 4 and 2).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much, perl. This is much easier compared to what I was doing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.