0

I'm trying to calculate the difference between each row but I have an issue that I don't know how to solve. Below is my sample code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Place': ['Hanoi','Hanoi','Hanoi','Hanoi','Hanoi','Hochiminh','Hochiminh','Hochiminh','Hochiminh','Hochiminh'],
    'Date': ['2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05','2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05'],
    'Number': [0,2,4,6,8,12,17,20,26,28]})

And it's the Output:

Place   Date    Number
0   Hanoi   2022-04-01  0
1   Hanoi   2022-04-02  2
2   Hanoi   2022-04-03  4
3   Hanoi   2022-04-04  6
4   Hanoi   2022-04-05  8
5   Hochiminh   2022-04-01  12
6   Hochiminh   2022-04-02  17
7   Hochiminh   2022-04-03  20
8   Hochiminh   2022-04-04  26
9   Hochiminh   2022-04-05  28

And then I used diff to calculate different number:

df['diff'] = df['Number'].diff()

and it's Output:

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  4.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0

As you see row 5 of column diff is calculating by subtract row 5 of column number and row 4 of column number but it's not what I want. I want row 5 will be reset to NaN or 0 because I want to calculate different number for each place.

This is my expected Output

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  0.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0
2
  • You don't want row 4 to be subtracted from row 5? What exactly is the desired output here? Commented Apr 26, 2022 at 3:37
  • @IshanShishodiya: I've updated my question for expected Output. Thank you. Commented Apr 26, 2022 at 3:40

1 Answer 1

3
df['diff'] = df.groupby('Place')['Number'].diff().fillna(0)

df

       Place        Date  Number  diff
0      Hanoi  2022-04-01       0   0.0
1      Hanoi  2022-04-02       2   2.0
2      Hanoi  2022-04-03       4   2.0
3      Hanoi  2022-04-04       6   2.0
4      Hanoi  2022-04-05       8   2.0
5  Hochiminh  2022-04-01      12   0.0
6  Hochiminh  2022-04-02      17   5.0
7  Hochiminh  2022-04-03      20   3.0
8  Hochiminh  2022-04-04      26   6.0
9  Hochiminh  2022-04-05      28   2.0
Sign up to request clarification or add additional context in comments.

1 Comment

@hoatran if all is okay, consider accepting the solution by clicking on the checkmark against the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.