pandas - Access value in a previous row in a Dataframe

Question

I am trying to add a new column to my dataframe that depends on values that may or may not exist in previous rows. My dataframe looks like this:

index  id  timestamp  sequence_index value  prev_seq_index
0      10  1          0              5      0
1      10  1          1              1      2
2      10  1          2              2      0
3      10  2          0              9      0
4      10  2          1              10     1
5      10  2          2              3      1
6      11  2          0              42     1
7      11  2          1              13     0

Note: there is no relation between index and sequence_index, index is just a counter.

What I want to do is add a column prev_value, that finds the value of the most recent row with the same id and sequence_index == prev_seq_index, if no such previous row exist, use default value, for the purpose of this question I will use default value of -1

index  id  timestamp  sequence_index value  prev_seq_index  prev_value
0      10  1          0              5      0               -1
1      10  1          1              1      2               -1
2      10  1          2              2      0               -1
3      10  2          0              9      0               5  # value from df[index == 0]
4      10  2          1              10     1               1  # value from df[index == 1]
5      10  2          2              3      1               1  # value from df[index == 1]
6      11  2          0              42     1               -1
7      11  2          1              13     0               -1

My current solution is a brute force which is very slow, and I was wondering if there was a faster way:

prev_values = np.zeros(len(df))
i = 0
for index, row in df.iterrows():
    # filter for previous rows with the same id and desired sequence index
    tmp_df = df[(df.id == row.id) & (df.timestamp < row.timestamp) \
                 & (df.sequence_index == row.prev_seq_index)]
    if (len(tmp_df) > 0):
        # get value from the most recent row
        prev_value = tmp_df[tmp_df.index == tmp_df.timestamp.idxmax()].value
    else:
        prev_value = -1
    prev_values[i] = prev_value
    i += 1

df['prev_value'] = prev_values

Off the top of my head, I cannot think of a faster algorithm. But, you can try using itertuples instead of iterrows for a pretty decent speed boost! — zerecees
– zerecees, Commented Sep 2, 2020 at 4:14
For row label 5 , shouldnt the prev_seq_index be 2? or did i misead? — anky
– anky, Commented Sep 2, 2020 at 4:25
prev_seq_index indicates which previous sequence matches the current according to info not displayed here, it does not have to match the same index. — sa3dl
– sa3dl, Commented Sep 2, 2020 at 19:03

user1788158 · Accepted Answer · 2020-09-02 04:37:31Z

1

i would suggest tackling this via a left join. However first you'll need to make sure that your data doesn't have duplicates. You'll need to create a dataframe of most recent timestamps and grab the values.

agg=pd.groupby(['sequence_index']).agg({'timestamp':'max'})

agg=pd.merge(agg,df['timestamp','sequence_index','value'], how='inner', on = ['timestamp','sequence_index'])

agg.rename(columns={'value': 'prev_value'}, inplace=True)

now you can join the data back on itself

df=pd.merge(df,agg,how='left',left_on='prev_seq_index',right_on='sequence_index')

now you can deal with the NaN values

df.prev_value=df.prev_value.fillna(-1)

answered Sep 2, 2020 at 4:37

user1788158

291 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas - Access value in a previous row in a Dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related