I am struggling to understand how df.apply()exactly works.
My problem is as follows: I have a dataframe df. Now I want to search in several columns for certain strings. If the string is found in any of the columns I want to add for each row where the string is found a "label" (in a new column).
I am able to solve the problem with map and applymap(see below).
However, I would expect that the better solution would be to use applyas it applies a function to an entire column.
Question: Is this not possible using apply? Where is my mistake?
Here are my solutions for using map and applymap.
df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])
Solution using map
def setlabel_func(column):
return df[column].str.contains("A")
mask = sum(map(setlabel_func, ["h1","h5"]))
df.ix[mask==1,"New Column"] = "Label"
Solution using applymap
mask = df[["h1","h5"]].applymap(lambda el: True if re.match("A",el) else False).T.any()
df.ix[mask == True, "New Column"] = "Label"
For applyI don't know how to pass the two columns into the function / or maybe don't understand the mechanics at all ;-)
def setlabel_func(column):
return df[column].str.contains("A")
df.apply(setlabel_func(["h1","h5"]),axis = 1)
Above gives me alert.
'DataFrame' object has no attribute 'str'
Any advice? Please note that the search function in my real application is more complex and requires a regex function which is why I use .str.contain in the first place.
mapandapplymapreturn. Sorry, I don't know how to paste in my output here? How do you do this?