1

I want to diff a column but based on the values of another column of my dataframe that indicates the step. For example:

col step_diff
0 3
13 3
28 3
45 3
45 3
45 1
50 1

and the output should be:

col step_diff col_dif
0 3 NaN
13 3 NaN
28 3 NaN
45 3 45
45 3 32
45 1 0
50 1 5
1
  • are time series data and I am interested in differentiating with step 3 or step 1 depending on the day. Commented Apr 8, 2021 at 12:29

2 Answers 2

4

You can try this:

df['col_diff'] = df['col'] - df.reindex(df.index - df['step_diff'])['col'].to_numpy()

Output:

   col  step_diff  col_diff
0    0          3       NaN
1   13          3       NaN
2   28          3       NaN
3   45          3      45.0
4   45          3      32.0
5   45          1       0.0
6   50          1       5.0

Details:

Let's use reindex with a difference between the current index and 'step_diff' to create an array of "shifted" 'col' to subtract from current 'col'.

Using time series:

d = {'col': {Timestamp('2021-01-10 00:00:00', freq='D'): 0,
  Timestamp('2021-01-11 00:00:00', freq='D'): 13,
  Timestamp('2021-01-12 00:00:00', freq='D'): 28,
  Timestamp('2021-01-13 00:00:00', freq='D'): 45,
  Timestamp('2021-01-14 00:00:00', freq='D'): 45,
  Timestamp('2021-01-15 00:00:00', freq='D'): 45,
  Timestamp('2021-01-16 00:00:00', freq='D'): 50},
 'step_diff': {Timestamp('2021-01-10 00:00:00', freq='D'): 3,
  Timestamp('2021-01-11 00:00:00', freq='D'): 3,
  Timestamp('2021-01-12 00:00:00', freq='D'): 3,
  Timestamp('2021-01-13 00:00:00', freq='D'): 3,
  Timestamp('2021-01-14 00:00:00', freq='D'): 3,
  Timestamp('2021-01-15 00:00:00', freq='D'): 1,
  Timestamp('2021-01-16 00:00:00', freq='D'): 1}}

df = pd.DataFrame(d)

Input df,

            col  step_diff
2021-01-10    0          3
2021-01-11   13          3
2021-01-12   28          3
2021-01-13   45          3
2021-01-14   45          3
2021-01-15   45          1
2021-01-16   50          1

Calculate col_diff,

df["col_diff"] = (
    df["col"]
    - df.reindex(df.index - pd.to_timedelta(df["step_diff"], unit="d"))["col"]
    .to_numpy()
)


df

Output:

            col  step_diff  col_diff
2021-01-10    0          3       NaN
2021-01-11   13          3       NaN
2021-01-12   28          3       NaN
2021-01-13   45          3      45.0
2021-01-14   45          3      32.0
2021-01-15   45          1       0.0
2021-01-16   50          1       5.0
Sign up to request clarification or add additional context in comments.

1 Comment

Great. This one is faster and neat. I was not aware of .reindex.
2

Use:

df['col_diff'] = ([df.col.iloc[pos] - df.col.iloc[pos - step] 
                   if pos - step >=0 else np.nan 
                   for pos, step in enumerate(df.step_diff)])

Output

>>> df
   col  step_diff  col_diff
0    0          3       NaN
1   13          3       NaN
2   28          3       NaN
3   45          3      45.0
4   45          3      32.0
5   45          1       0.0
6   50          1       5.0

1 Comment

Thanks!! it is useful for me

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.