How to use the pandas shift function along with specific column conditions

Question

I have a data frame that looks like the following (last column shown w/result that I want to get to):

timestamp                 first_actual  first_required  location    first_initial_pass  first_final
2019-05-03T06:00:00.000Z    3.125       0.000           10B          1.0                1.0 
2019-05-03T18:00:00.000Z    2.975       0.000           10B          1.0                1.0 
2019-05-04T06:00:00.000Z    2.825       0.000           10B          **0.5              1.0**   
2019-05-04T18:00:00.000Z    2.675       0.000           10B          0.0                0.0 
2019-05-05T06:00:00.000Z    2.525       0.000           10B          **0.5              0.0**

It's sorted by location and time stamp. The column 'first_initial_pass' results in three possible outcomes (0; 0.5; 1) based on some rules using columns 'first_actual' and 'first_required'. I am trying to generate a new column (shown here as first_final) that will copy over the value from column 'first_initial_pass' except for instances where that value is 0.5.

In instances where the value of first_initial_pass is 0.5, that value needs to change to either 0 or 1 in column 'first_final'. It should change to 1 iff the values in both of the two rows above the current row have a value of 1, otherwise it should change to 0 (changes I want to see are noted with asterisks in the data frame).

I am trying to use the shift function to specify these conditions as follows:

data_sorted.loc[( (data_sorted[data_sorted['first_initial_pass'] == 0.5]) &
                              (data_sorted['first_initial_pass'].shift(1) == 1) & 
                              (data_sorted['first_initial_pass'].shift(2) == 1) ), 'first_final'] = 1

However, I get the following error: "TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]", so then I try leaving the bollean piece out like this:

data_sorted.loc[( 
                              (data_sorted['first_initial_pass'].shift(1) == 1) & 
                              (data_sorted['first_initial_pass'].shift(2) == 1) ), 'first_final'] = 1

However, then the rows do not change like I need them to (meaning for just rows that have 0.5 as the value under first_initial_pass column.

Would appreaciate insight into what corrections I can make.

Check out my answer below and let me know if it works for you — davidbilla
– davidbilla, Commented Mar 10, 2020 at 14:00

davidbilla · Accepted Answer · 2020-03-10 05:05:43Z

1

I guess you could make use of np.where() and assign the value of first_final as 0 or 1 using the df.shift() in the np.where() condition.

Something like this: np.where takes the first arg as the condition and the 2nd arg the true value and the 3rd arg is the false value

df['first_final'] = np.where((df['first_initial_pass']!=0.5), df['first_initial_pass'],
                             np.where((df['first_initial_pass'].shift(1)==1.0)&
                                      (df['first_initial_pass'].shift(2)==1.0),
                                      1, 0))

Output:

                  timestamp  first_actual  ...  first_initial_pass first_final
0  2019-05-03T06:00:00.000Z         3.125  ...                 1.0         1.0
1  2019-05-03T18:00:00.000Z         2.975  ...                 1.0         1.0
2  2019-05-04T06:00:00.000Z         2.825  ...                 0.5         1.0
3  2019-05-04T18:00:00.000Z         2.675  ...                 0.0         0.0
4  2019-05-05T06:00:00.000Z         2.525  ...                 0.5         0.0

Note that you have to be careful about the first two rows if the value is 0.5, then this will be 0 as the df.shift() does not account it.

edited Mar 10, 2020 at 5:05

answered Mar 10, 2020 at 4:44

davidbilla

2,2321 gold badge22 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anna Y. Over a year ago

Very helpful! Appreciate your solution @davidbilla

Collectives™ on Stack Overflow

How to use the pandas shift function along with specific column conditions

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related