I am trying to add a new column to my dataframe that depends on values that may or may not exist in previous rows. My dataframe looks like this:
index id timestamp sequence_index value prev_seq_index
0 10 1 0 5 0
1 10 1 1 1 2
2 10 1 2 2 0
3 10 2 0 9 0
4 10 2 1 10 1
5 10 2 2 3 1
6 11 2 0 42 1
7 11 2 1 13 0
Note: there is no relation between index and sequence_index, index is just a counter.
What I want to do is add a column prev_value, that finds the value of the most recent row with the same id and sequence_index == prev_seq_index, if no such previous row exist, use default value, for the purpose of this question I will use default value of -1
index id timestamp sequence_index value prev_seq_index prev_value
0 10 1 0 5 0 -1
1 10 1 1 1 2 -1
2 10 1 2 2 0 -1
3 10 2 0 9 0 5 # value from df[index == 0]
4 10 2 1 10 1 1 # value from df[index == 1]
5 10 2 2 3 1 1 # value from df[index == 1]
6 11 2 0 42 1 -1
7 11 2 1 13 0 -1
My current solution is a brute force which is very slow, and I was wondering if there was a faster way:
prev_values = np.zeros(len(df))
i = 0
for index, row in df.iterrows():
# filter for previous rows with the same id and desired sequence index
tmp_df = df[(df.id == row.id) & (df.timestamp < row.timestamp) \
& (df.sequence_index == row.prev_seq_index)]
if (len(tmp_df) > 0):
# get value from the most recent row
prev_value = tmp_df[tmp_df.index == tmp_df.timestamp.idxmax()].value
else:
prev_value = -1
prev_values[i] = prev_value
i += 1
df['prev_value'] = prev_values
5, shouldnt theprev_seq_indexbe 2? or did i misead?prev_seq_indexindicates which previous sequence matches the current according to info not displayed here, it does not have to match the same index.