Return rows in pandas based on values in multiple columns

Question

Needed some help with pandas...I'm working on this data and I'm trying to calculate some changes over time per region. Basically, I'm trying to find the oldest quantity and the newest quantity for each area in question. I have code that can give me the year of the most recent and oldest data recordes, however I need to gather the whole row so I can work on the 'quantity' column. Any inputs? here is what i have :

df.groupby(['Country or Area'])['Year'].max()

Thanks in advance!

Chris · Accepted Answer · 2020-02-15 01:46:51Z

1

df = df.sort_values(by=['Country or Area','Year'])
df.groupby('Country or Area').agg(['first','last']).stack()

answered Feb 15, 2020 at 1:46

Chris

16.3k3 gold badges26 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RootTwo · Accepted Answer · 2020-02-15 02:17:08Z

1

Use idxmin() and idxmax(). Something like:

grp = df.groupby(['Country or Area'])

for name,group in grp:
    print(name)

    minidx = group['Year'].idxmin()
    maxidx = group['Year'].idxmax()

    print(f"min: {group['Year'][minidx]} {group['Quantity'][minidx]}")
    print(f"max: {group['Year'][maxidx]} {group['Quantity'][maxidx]}")
    print()

edited Feb 15, 2020 at 2:17

answered Feb 15, 2020 at 2:03

RootTwo

4,4361 gold badge13 silver badges15 bronze badges

Comments

Kenan · Accepted Answer · 2020-02-15 01:51:54Z

0

You can get oldest and newest with idxmin and idxmax

df.loc[df.groupby(['Country or Area'])['Year'].idxmin()]

answered Feb 15, 2020 at 1:51

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

Comments

Hely Andrés Palencia · Accepted Answer · 2020-02-15 02:20:25Z

0

You need to use agg functions of groupby()

You can pass the functions or a dict of functions to the columns you need to aggregate

In your case the code should be like Crish solution is the better way to do it.

Sort the dataframe by the value to check and then group and get by .agg() the result that you need

The stack() method works to deflate the df level

answered Feb 15, 2020 at 2:20

Hely Andrés Palencia

617 bronze badges

Collectives™ on Stack Overflow

Return rows in pandas based on values in multiple columns

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related