0

Needed some help with pandas...I'm working on this data and I'm trying to calculate some changes over time per region. Basically, I'm trying to find the oldest quantity and the newest quantity for each area in question. I have code that can give me the year of the most recent and oldest data recordes, however I need to gather the whole row so I can work on the 'quantity' column. Any inputs? here is what i have :

df.groupby(['Country or Area'])['Year'].max()

Thanks in advance!

4 Answers 4

1
df = df.sort_values(by=['Country or Area','Year'])
df.groupby('Country or Area').agg(['first','last']).stack()
Sign up to request clarification or add additional context in comments.

Comments

1

Use idxmin() and idxmax(). Something like:

grp = df.groupby(['Country or Area'])

for name,group in grp:
    print(name)

    minidx = group['Year'].idxmin()
    maxidx = group['Year'].idxmax()

    print(f"min: {group['Year'][minidx]} {group['Quantity'][minidx]}")
    print(f"max: {group['Year'][maxidx]} {group['Quantity'][maxidx]}")
    print()

Comments

0

You can get oldest and newest with idxmin and idxmax

df.loc[df.groupby(['Country or Area'])['Year'].idxmin()]

Comments

0

You need to use agg functions of groupby()

You can pass the functions or a dict of functions to the columns you need to aggregate

In your case the code should be like Crish solution is the better way to do it.

Sort the dataframe by the value to check and then group and get by .agg() the result that you need

The stack() method works to deflate the df level

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.