Pandas .diff() on specific rows

Question

I have seen many similar questions but none of them solves my problem.

I have a very large dataset where I want to find difference for only a few selected rows from the previous row. In the fol example, I would like to get diff() on pVal based on the value in calc, as shown:

     pVal        calc        pDiff
1    .17         False       NaN
2    .31         False       NaN
3    .46         False       NaN
4    .39         True       -.07
5    .26         False       NaN
6    .6          True       .34

Note: pDiff gets NaN by default

One can simply calculate the difference for all the rows and later replace pDiff with NaN against False under 'calc'. But as stated earlier, I have a very large dataset with very few 'True' values in the calc column, so lots of overhead.

I have tried the following:

df['pDiff'] = df[df['calc']==True]['pVal'].diff()

But it gives incorrect results, calculating difference between the rows with calc==True. In our example, the difference for row 6 is computed between rows 6 and 4 (0.6 - 0.39 = 0.21), instead of expected 0.34 between rows 6 and 5. Difference for row 4 remains NaN being the first row with calc==True.

I have the option to iterate through all the rows but that is too slow for me.

I need a solution that calculates and changes values for only those rows where calc contains True.

Valdi_Bo · Accepted Answer · 2020-06-19 18:28:15Z

2

Run: df['pDiff'] = np.where(df.calc, df.pVal.diff(), np.nan).

df.pVal.diff() is the source of data and np.where acts as a filter. df.calc is the condition and np.nan is the "other" value.

answered Jun 19, 2020 at 18:28

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Erickson · Accepted Answer · 2020-06-19 18:16:02Z

1

np.where + shift are great together for previous or next row comparison based on conditions :)

df['pDiff'] = np.where((df['calc'] == True), df['pVal'] - df['pVal'].shift(), np.nan)

answered Jun 19, 2020 at 18:16

David Erickson

16.7k2 gold badges21 silver badges37 bronze badges

Comments

Ch3steR · Accepted Answer · 2020-06-19 18:31:18Z

0

Try,

df['shifted'] = df.calc.shift()
df1 = df[(df.calc == True) | (df.shifted == True)]
df1.pdidff = df1.pVal.diff()

edited Jun 19, 2020 at 18:31

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

answered Jun 19, 2020 at 18:15

Igor Rivin

4,8742 gold badges31 silver badges45 bronze badges

Collectives™ on Stack Overflow

Pandas .diff() on specific rows

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related