1

I have a dataframe as below. Based on few conditions, I need to retrive the column.

    Wifi_User1      Wifi_User2      Wifi_User3      Thermostat   Act_User1   Act_User2  Act_User3
    -58             -48             -60             18              0               1           0
    -60             -56             -75             18              0               1           1
    -45             -60             -45             18              0               1           1
    -67             -45             -60             18              1               0           1
    -40             -65             -65             18              1               0           1
    -55             -78             -74             18              1               0           0
    -55             -45             -65             18              1               0           0
    -67             -45             -44             18              0               0           0
    -65             -68             -70             18              0               0           0
    -70             -70             -65             24              0               0           0
    -72             -56             -45             24              0               1           0
    -75             -45             -60             24              0               1           0
    -77             -48             -65             24              0               0           0

The conditions are as follows:

if (Wifi_User1==Wifi_User2) or (Wifi_User2==Wifi_User3)
  or (Wifi_User3==Wifi_User1) or (Wifi_User1==Wifi_User2==Wifi_User3) 
   and when the thermostat value is changing

then

scan Act_User1, Act_User2, Act_User3 columns for the first instance of 1 
before the thermostat value changes. 

If its Act_user1, return 1 
else if its Act_User2 return 2
else return 3

For example, in the above dataset, at 10th row Wifi_user1 == Wifi_User2 and the thermostat value is changing from 18 to 24.

For this condition, I will scan Act_User1, Act_User2, Act_User3. And see that, the first instance of 1 occurs for Act_User1, hence I need to return the value 1 in the new column for this particular row.

Please help me as how to go about it, as I'm new to Python and exploring python

1 Answer 1

1

To answer the first part of your question, here's how you would transcribe your if statement:

wifi_user_equality = (df.Wifi_User1 == df.Wifi_User2) | \
                 (df.Wifi_User2 == df.Wifi_User3) | \
                 (df.Wifi_User3 == df.Wifi_User1)
thermostat_change = df.Thermostat != df.Thermostat.shift(1)

Then to return all rows where you have both true:

df[wifi_user_equality & thermostat_change]

         Wifi_User1  Wifi_User2  Wifi_User3  Thermostat  Act_User1  Act_User2   Act_User3 
9           -70         -70         -65          24          0        0.0          0.0  

Or if you only want the index of these:

df.index[(wifi_user_equality & thermostat_change)]

For the second part of your question, it's trickier, but here's a solution:

# We add the first index element too
zero = df.index == df.index[0]

# Get the list of index where the condition is satisfied, in reverse order
idx = list(df.index[(wifi_user_equality & thermostat_change) | zero][::-1])

for i, index in enumerate(idx):
    if index > 0:
        # I use a try/except block in case it cannot find an occurrence of 1
        # (all previous act users are 0).
        # Might not be needed in your specific application
        try:
            x= df.loc[idx[i+1]:(index-1), ['Act_User1','Act_User2','Act_User3']]
            col_of_first_1 = np.where(x==1)[1][-1] + 1
        except:
            col_of_first_1 = 'Not Found'
        # Assign to a new column
        df.loc[index, 'Last_Act_User'] = col_of_first_1

In action:

I've modified your data in order to have a more complex case:

Wifi_User1      Wifi_User2      Wifi_User3      Thermostat   Act_User1   Act_User2  Act_User3
-70             -70             -65             24              0               0           0
-77             -48             -65             24              0               0           0
-58             -48             -48             18              0               1           0
-60             -56             -75             18              0               1           1
-45             -60             -45             18              0               1           1
-67             -45             -60             18              1               0           1
-40             -65             -65             18              1               0           1
-55             -78             -74             18              1               0           0
-55             -45             -65             18              1               0           0
-67             -45             -44             18              0               0           0
-65             -68             -70             18              0               0           0
-70             -70             -65             24              0               0           0
-72             -56             -45             24              0               1           0
-75             -45             -60             24              0               1           0
-77             -48             -65             24              0               0           0

Will give df:

    Wifi_User1  Wifi_User2  Wifi_User3  Thermostat  Act_User1  Act_User2  \
0          -70         -70         -65          24          0          0   
1          -77         -48         -65          24          0          0   
2          -58         -48         -48          18          0          1   
3          -60         -56         -75          18          0          1   
4          -45         -60         -45          18          0          1   
5          -67         -45         -60          18          1          0   
6          -40         -65         -65          18          1          0   
7          -55         -78         -74          18          1          0   
8          -55         -45         -65          18          1          0   
9          -67         -45         -44          18          0          0   
10         -65         -68         -70          18          0          0   
11         -70         -70         -65          24          0          0   
12         -72         -56         -45          24          0          1   
13         -75         -45         -60          24          0          1   
14         -77         -48         -65          24          0          0   

    Act_User3 Last_Act_User  
0           0           NaN  
1           0           NaN  
2           0     Not Found  
3           1           NaN  
4           1           NaN  
5           1           NaN  
6           1           NaN  
7           0           NaN  
8           0           NaN  
9           0           NaN  
10          0           NaN  
11          0             1  
12          0           NaN  
13          0           NaN  
14          0           NaN  
Sign up to request clarification or add additional context in comments.

11 Comments

Thank you @Julien Marrec. Based on this condition, I need to find the first instance of 1 in Act_User1, Act_User2 and Act_User3 coulmn and return the value 1,2 or 3 in a new column
Yeah, I'm trying to find a way to do that in pandas-way (vectorized) without resorting to a loop. We agree that you are looking for the first 1 in reverse order (going 'up' the table before the thermostat change) right?
Yes, that's right. So in my example it would be 6th row, for the column Act_User1 where the first instance of 1 occurs going upwards.
I added a solution
AttributeError: ("'numpy.int64' object has no attribute 'shift'", u'occurred at index 0')
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.