I have a data frame that looks like the following (last column shown w/result that I want to get to):
timestamp first_actual first_required location first_initial_pass first_final
2019-05-03T06:00:00.000Z 3.125 0.000 10B 1.0 1.0
2019-05-03T18:00:00.000Z 2.975 0.000 10B 1.0 1.0
2019-05-04T06:00:00.000Z 2.825 0.000 10B **0.5 1.0**
2019-05-04T18:00:00.000Z 2.675 0.000 10B 0.0 0.0
2019-05-05T06:00:00.000Z 2.525 0.000 10B **0.5 0.0**
It's sorted by location and time stamp. The column 'first_initial_pass' results in three possible outcomes (0; 0.5; 1) based on some rules using columns 'first_actual' and 'first_required'. I am trying to generate a new column (shown here as first_final) that will copy over the value from column 'first_initial_pass' except for instances where that value is 0.5.
In instances where the value of first_initial_pass is 0.5, that value needs to change to either 0 or 1 in column 'first_final'. It should change to 1 iff the values in both of the two rows above the current row have a value of 1, otherwise it should change to 0 (changes I want to see are noted with asterisks in the data frame).
I am trying to use the shift function to specify these conditions as follows:
data_sorted.loc[( (data_sorted[data_sorted['first_initial_pass'] == 0.5]) &
(data_sorted['first_initial_pass'].shift(1) == 1) &
(data_sorted['first_initial_pass'].shift(2) == 1) ), 'first_final'] = 1
However, I get the following error: "TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]", so then I try leaving the bollean piece out like this:
data_sorted.loc[(
(data_sorted['first_initial_pass'].shift(1) == 1) &
(data_sorted['first_initial_pass'].shift(2) == 1) ), 'first_final'] = 1
However, then the rows do not change like I need them to (meaning for just rows that have 0.5 as the value under first_initial_pass column.
Would appreaciate insight into what corrections I can make.