9

How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

If I run the code I am getting error

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

Complete Dataset -https://github.com/lakshmipriya04/py-sample

4
  • What do you think title.str is? Commented Jan 1, 2018 at 18:23
  • Isn't isnull a function that takes an array? The docs. I'm not sure gender.isnull==True makes sense, Commented Jan 1, 2018 at 18:23
  • title is pandas Series. I am looking for Lip in that column Commented Jan 1, 2018 at 18:25
  • @LPR btw , if you like other answers , you can upvote :-) Commented Jan 1, 2018 at 19:37

3 Answers 3

12

Some things to note here -

  1. If you're using only two columns, calling apply over 4 columns is wasteful
  2. Calling apply is wasteful and inefficient, because it is slow, uses a lot of memory, and offers no vectorisation benefits to you
  3. In apply, you're dealing with scalars, so you do not use the .str accessor as you would a pd.Series object. title.contains would be enough. Or more pythonically, "lip" in title.
  4. gender.isnull sounds completely wrong to the interpreter because gender is a scalar, it has no isnull attribute

Option 1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains check case insensitive -

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)

Option 2
Another alternative is using pd.Series.mask/pd.Series.where -

df['gender'] = df.gender.mask(m, 'women')

Or,

df['gender'] = df.gender.where(~m, 'women')

<!- ->

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

The mask implicitly applies the new value to the column based on the mask provided.

Sign up to request clarification or add additional context in comments.

Comments

7

Or simply use loc as an option 3 to @COLDSPEED's answer

cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'


    category        gender  sub-category    title
0   health&beauty   women   makeup          lipbalm
1   health&beauty   women   makeup          lipstick
2   NaN             women       NaN         lipgloss

2 Comments

Thank you for the answer.When should I use apply function? And why do I get attribute error
@LPR who are you speaking to? I've addressed your problems in my answer. Also, as for when to use apply, the answer would be, when you can't use anything else.
3

If we are due with NaN values , fillna can be one of the method:-)

df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]: 
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

3 Comments

I see you went out of the box to get this one. Nice.
@cᴏʟᴅsᴘᴇᴇᴅ aha, It is hard to think outside the box : -)
@Wen, Happy New Year. I answered and went back to enjoying the last day of holiday so missed the message :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.