Python User Defined Function using IF statements not working

Question

I have the following dataframe

Index   education   marital-status  occupation         gender    target
0       bachelors   never-married   adm-clerical       male      0
1       bachelors   spouse          exec-managerial    male      0
2       hs-grad     divorced        handlers-cleaners  male      0
3       11th        spouse          handlers-cleaners  male      0
4       bachelors   spouse          prof-specialty     female    0
5       masters     spouse          exec-managerial    female    0
6       other       other           other-service      female    0
7       hs-grad     spouse          exec-managerial    male      1
8       masters     never-married   prof-specialty     female    1
9       bachelors   spouse          exec-managerial    male      1

Can someone explain to me why the following doesn't work - I feel like it should from what I've read and what I've seen applied.

def new_features(education, gender, target):

  if [((education == 'bachelors') & (gender == 'male') & (target == 1))]:
      result = 'educated_male_convert'
  elif [((education == 'bachelors') & (gender == 'female') & (target == 1))]:
      result = 'educated_female_convert'
  else:
      result = 'educated_not_determined'
  return result

df['new_col'] = df.apply(lambda row: new_features(row['education'], row['gender'], row['target']), axis=1)

It just returns: educated_male_convert

I followed numerous tutorials and read other threads and applied the same code to my own dataset - not sure what I'm missing.

Any help would be appreciated

is the function just for example? There is a better way using numpy and pandas without using a loop for such operations — anky
– anky, Commented Aug 25, 2019 at 5:01
Well, I'd like to understand why the above doesn't work [and how you could make it work] but I'd also be equally interested in achieving the same result using a better method — AdrianC
– AdrianC, Commented Aug 25, 2019 at 5:02
Try to determine the row. I guess there is an interation related key error. Just use a basic loop to check the value. — Ashraful
– Ashraful, Commented Aug 25, 2019 at 5:04
That helped - however, when it runs, it only returns educated_male_convert which it should only do for row 9, everything else should educated_not_determined — AdrianC
– AdrianC, Commented Aug 25, 2019 at 5:10

GZ0 · Accepted Answer · 2019-08-25 05:23:19Z

4

The problem is that you put the if conditions in square brackets. So instead of testing an expression if False: ..., the code is actually testing if [False]: .... And since any non-empty list evaluates to True, [False] would be evaluated to True and the code goes to the wrong branch.

answered Aug 25, 2019 at 5:23

GZ0

4,2881 gold badge12 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

AdrianC Over a year ago

This worked. I know what happened - previously, I had an or (|) element present which I believe requires square brackets, but I didn't remove them when I removed this element. Very helpful. Thanks!

GZ0 Over a year ago

As a side remark, you could use np.where to compute the new feature columns in a vectorized manner, which is much more efficient than using df.apply. See this for a basic example. In your case a nested np.where call is needed. np.select is another alternative choice for such tasks.

anky Over a year ago

yes and this

AdrianC Over a year ago

I did try this actually, but couldn't get it to work:

np.where((df['education'] == 'bachelors') & (df['gender'] == 'male') & (df['target'] == 1, 'educated_male_convert', (np.where(df['education'] == 'bachelors') & (df['gender'] == 'female') & (df['target'] == 1), 'educated_female_convert', 'educated_not_determined')))

. What did I miss? Got the following error ValueError: setting an array element with a sequence.

GZ0 Over a year ago

The parantheses do not seem to be right. The second np.where is applied only to df['education'] == 'bachelors'. It would be better to store the outcomes of those conditions as temporary variables rather than putting everything in a big expression like this.

O.O · Accepted Answer · 2019-08-25 05:51:31Z

1

This is also another way to do that :

df['new_col'] = df.apply(lambda row: 'educated_male_convert' if row['education'] == 'bachelors' and row['gender'] == 'male' and row['target'] == 1
                      else ('educated_female_convert' if row['education'] == 'bachelors' and row['gender'] == 'female' and row['target'] == 1 
                      else ('educated_not_determined')), axis=1)
df

answered Aug 25, 2019 at 5:51

O.O

1,30811 silver badges14 bronze badges

Comments

anky · Accepted Answer · 2019-08-25 06:24:25Z

Here is a np.select solution:

c1=df.education=='bachelors' 
c2=df.gender=='male'
c3=df.target.astype(bool)
df['new_col']=np.select([c1&c2&c3,c1&~c2&c3],['educated_male_convert',
        'educated_female_convert'],'educated_not_determined')
print(df)

       education marital-status         occupation  gender  target  \
Index                                                                
0      bachelors  never-married       adm-clerical    male       0   
1      bachelors         spouse    exec-managerial    male       0   
2        hs-grad       divorced  handlers-cleaners    male       0   
3           11th         spouse  handlers-cleaners    male       0   
4      bachelors         spouse     prof-specialty  female       0   
5        masters         spouse    exec-managerial  female       0   
6          other          other      other-service  female       0   
7        hs-grad         spouse    exec-managerial    male       1   
8        masters  never-married     prof-specialty  female       1   
9      bachelors         spouse    exec-managerial    male       1   

                       new_col  
Index                           
0      educated_not_determined  
1      educated_not_determined  
2      educated_not_determined  
3      educated_not_determined  
4      educated_not_determined  
5      educated_not_determined  
6      educated_not_determined  
7      educated_not_determined  
8      educated_not_determined  
9        educated_male_convert

Collectives™ on Stack Overflow

Python User Defined Function using IF statements not working

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related