Python: pandas apply vs. map

Question

I am struggling to understand how df.apply()exactly works.

My problem is as follows: I have a dataframe df. Now I want to search in several columns for certain strings. If the string is found in any of the columns I want to add for each row where the string is found a "label" (in a new column).

I am able to solve the problem with map and applymap(see below).

However, I would expect that the better solution would be to use applyas it applies a function to an entire column.

Question: Is this not possible using `apply`? Where is my mistake?

Here are my solutions for using map and applymap.

df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])

Solution using `map`

def setlabel_func(column):
    return df[column].str.contains("A")

mask = sum(map(setlabel_func, ["h1","h5"]))
df.ix[mask==1,"New Column"] = "Label"

Solution using `applymap`

mask = df[["h1","h5"]].applymap(lambda el: True if re.match("A",el) else False).T.any()
df.ix[mask == True, "New Column"] = "Label"

For applyI don't know how to pass the two columns into the function / or maybe don't understand the mechanics at all ;-)

def setlabel_func(column):
    return df[column].str.contains("A")

df.apply(setlabel_func(["h1","h5"]),axis = 1)

Above gives me alert.

'DataFrame' object has no attribute 'str'

Any advice? Please note that the search function in my real application is more complex and requires a regex function which is why I use .str.contain in the first place.

Hi John, thanks for your response. My expected output is what solutions for map and applymap return. Sorry, I don't know how to paste in my output here? How do you do this? — FredMaster
– FredMaster, Commented Feb 11, 2017 at 12:00

jezrael · Accepted Answer · 2017-02-11 12:21:45Z

7

Another solutions are use DataFrame.any for get at least one True per row:

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
      h1     h5
0   True  False
1  False  False
2  False   True

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0     True
1    False
2     True
dtype: bool

df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
                     'Label', '')

print (df)
  h1 h2 h3 h4 h5    new
0  A  B  C  D  Z  Label
1  E  A  G  H  Y       
2  I  J  K  L  A  Label

mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
  h1 h2 h3 h4 h5    New
0  A  B  C  D  Z  Label
1  E  A  G  H  Y    NaN
2  I  J  K  L  A  Label

answered Feb 11, 2017 at 12:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

FredMaster Over a year ago

Thanks for your swift reply. np.where is new to me. This will improve all my previous code significantly :-)

piRSquared · Accepted Answer · 2017-02-11 12:04:49Z

5

pd.DataFrame.apply iterates over each column, passing the column as a pd.Series to the function being applied. In you case, the function you're trying to apply doesn't lend itself to being used in apply

Do this instead to get your idea to work

mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A').any(), 1)
df.loc[mask, 'New Column'] = 'Label'

  h1 h2 h3 h4 h5 New Column
0  A  B  C  D  Z      Label
1  E  A  G  H  Y        NaN
2  I  J  K  L  A      Label

answered Feb 11, 2017 at 12:04

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

FredMaster Over a year ago

Great. Works just fine. Thanks for your swift reply.

MaxU - stand with Ukraine · Accepted Answer · 2017-02-11 12:01:53Z

3

IIUC you can do it this way:

In [23]: df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A'))
                                             .sum(1) > 0,
                              'Label', '')

In [24]: df
Out[24]:
  h1 h2 h3 h4 h5    new
0  A  B  C  D  Z  Label
1  E  A  G  H  Y
2  I  J  K  L  A  Label

answered Feb 11, 2017 at 12:01

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

2 Comments

FredMaster Over a year ago

Thanks for your swift reply. np.where is new to me. This will improve all my previous code significantly :-)

MaxU - stand with Ukraine Over a year ago

@FredMaster, glad i could help :-)

Burgertron · Accepted Answer · 2020-03-04 11:04:18Z

Others have given good alternative methods. Here is a way to use apply 'row wise' (axis=1) to get your new column indicating presence of "A" for a bunch of columns.

If you are passed a row, you can just join the strings together into one big string and then use a string comparison ("in") see below. here I am combing all columns, but you can do it with just H1 and h5 easily.

df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])

def dothat(row):
    sep = ""
    return "A" in sep.join(row['h1':'h5'])
df['NewColumn'] = df.apply(dothat,axis=1)

This just squashes squashes each row into one string (e.g. ABCDZ) and looks for "A". This is not that efficient though if you just want to quit the first time you find the string then combining all the columns could be a waste of time. You could easily change the function to look column by column and quit (return true) when it finds a hit.

Collectives™ on Stack Overflow

Python: pandas apply vs. map

Question: Is this not possible using `apply`? Where is my mistake?

Solution using `map`

Solution using `applymap`

4 Answers 4

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Question: Is this not possible using apply? Where is my mistake?

Solution using map

Solution using applymap

4 Answers 4

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Question: Is this not possible using `apply`? Where is my mistake?

Solution using `map`

Solution using `applymap`