1

I have the following dataframe

Index   education   marital-status  occupation         gender    target
0       bachelors   never-married   adm-clerical       male      0
1       bachelors   spouse          exec-managerial    male      0
2       hs-grad     divorced        handlers-cleaners  male      0
3       11th        spouse          handlers-cleaners  male      0
4       bachelors   spouse          prof-specialty     female    0
5       masters     spouse          exec-managerial    female    0
6       other       other           other-service      female    0
7       hs-grad     spouse          exec-managerial    male      1
8       masters     never-married   prof-specialty     female    1
9       bachelors   spouse          exec-managerial    male      1

Can someone explain to me why the following doesn't work - I feel like it should from what I've read and what I've seen applied.

def new_features(education, gender, target):

  if [((education == 'bachelors') & (gender == 'male') & (target == 1))]:
      result = 'educated_male_convert'
  elif [((education == 'bachelors') & (gender == 'female') & (target == 1))]:
      result = 'educated_female_convert'
  else:
      result = 'educated_not_determined'
  return result

df['new_col'] = df.apply(lambda row: new_features(row['education'], row['gender'], row['target']), axis=1)

It just returns: educated_male_convert

I followed numerous tutorials and read other threads and applied the same code to my own dataset - not sure what I'm missing.

Any help would be appreciated

5
  • is the function just for example? There is a better way using numpy and pandas without using a loop for such operations Commented Aug 25, 2019 at 5:01
  • 1
    Well, I'd like to understand why the above doesn't work [and how you could make it work] but I'd also be equally interested in achieving the same result using a better method Commented Aug 25, 2019 at 5:02
  • Can you print out row before the last line? Commented Aug 25, 2019 at 5:04
  • Try to determine the row. I guess there is an interation related key error. Just use a basic loop to check the value. Commented Aug 25, 2019 at 5:04
  • That helped - however, when it runs, it only returns educated_male_convert which it should only do for row 9, everything else should educated_not_determined Commented Aug 25, 2019 at 5:10

3 Answers 3

4

The problem is that you put the if conditions in square brackets. So instead of testing an expression if False: ..., the code is actually testing if [False]: .... And since any non-empty list evaluates to True, [False] would be evaluated to True and the code goes to the wrong branch.

Sign up to request clarification or add additional context in comments.

5 Comments

This worked. I know what happened - previously, I had an or (|) element present which I believe requires square brackets, but I didn't remove them when I removed this element. Very helpful. Thanks!
As a side remark, you could use np.where to compute the new feature columns in a vectorized manner, which is much more efficient than using df.apply. See this for a basic example. In your case a nested np.where call is needed. np.select is another alternative choice for such tasks.
yes and this
I did try this actually, but couldn't get it to work: np.where((df['education'] == 'bachelors') & (df['gender'] == 'male') & (df['target'] == 1, 'educated_male_convert', (np.where(df['education'] == 'bachelors') & (df['gender'] == 'female') & (df['target'] == 1), 'educated_female_convert', 'educated_not_determined'))). What did I miss? Got the following error ValueError: setting an array element with a sequence.
The parantheses do not seem to be right. The second np.where is applied only to df['education'] == 'bachelors'. It would be better to store the outcomes of those conditions as temporary variables rather than putting everything in a big expression like this.
1

This is also another way to do that :

df['new_col'] = df.apply(lambda row: 'educated_male_convert' if row['education'] == 'bachelors' and row['gender'] == 'male' and row['target'] == 1
                      else ('educated_female_convert' if row['education'] == 'bachelors' and row['gender'] == 'female' and row['target'] == 1 
                      else ('educated_not_determined')), axis=1)
df

Comments

1

Here is a np.select solution:

c1=df.education=='bachelors' 
c2=df.gender=='male'
c3=df.target.astype(bool)
df['new_col']=np.select([c1&c2&c3,c1&~c2&c3],['educated_male_convert',
        'educated_female_convert'],'educated_not_determined')
print(df)

       education marital-status         occupation  gender  target  \
Index                                                                
0      bachelors  never-married       adm-clerical    male       0   
1      bachelors         spouse    exec-managerial    male       0   
2        hs-grad       divorced  handlers-cleaners    male       0   
3           11th         spouse  handlers-cleaners    male       0   
4      bachelors         spouse     prof-specialty  female       0   
5        masters         spouse    exec-managerial  female       0   
6          other          other      other-service  female       0   
7        hs-grad         spouse    exec-managerial    male       1   
8        masters  never-married     prof-specialty  female       1   
9      bachelors         spouse    exec-managerial    male       1   

                       new_col  
Index                           
0      educated_not_determined  
1      educated_not_determined  
2      educated_not_determined  
3      educated_not_determined  
4      educated_not_determined  
5      educated_not_determined  
6      educated_not_determined  
7      educated_not_determined  
8      educated_not_determined  
9        educated_male_convert  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.