Let's say I have this toy dataset
import pandas as pd
df = pd.DataFrame({
'animal': ['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
'num': range(6)
})
and I create two simple custom functions (one for the string column animal, one for the numeric column num) that I will later use in an apply function. Such functions are
def fn_num(x):
if x['num'] >= 5:
return 1
elif x['num'] <= 1:
return 0
else:
return -1
def fn_animal(x):
if x['animal'].isin(['cow', 'hippo']):
return 1
elif x['animal'].str.contains('ee'):
return 0
else:
return -1
where the argument x should be a pandas DataFrame such as the object df.
I later use them in an apply function (I know that this is not the most optimized code in terms of efficiency, but I prefer to leave it in this way for the sake of clarity)
df.apply(fn_num, axis=1)
0 0
1 0
2 -1
3 -1
4 -1
5 1
dtype: int64
df.apply(fn_animal, axis=1)
AttributeError: ("'str' object has no attribute 'isin'", 'occurred at index 0')
The function fn_num applied to the numeric column works fine, whereas the function fn_animal applied to the string column gives back an error. However, if I write the code outside the custom function, I get no errors with the attribute isin:
df['animal'].isin(['cow', 'hippo'])
0 False
1 True
2 False
3 False
4 False
5 True
Name: animal, dtype: bool
df['animal'].str.contains('ee')
0 False
1 False
2 False
3 True
4 False
5 False
Name: animal, dtype: bool
My desired output would be:
df.apply(fn_animal, axis=1)
0 -1
1 1
2 -1
3 0
4 -1
5 1
dtype: int64
I spent quite some type on this issue and I'm sure I'm missing something very silly but I couldn't figure it out. What can I do to make the function fn_animal work inside the apply?
x['animal']gives you a nativestrobject... you probably want to change it to:if x['animal'] in ['cow', 'hippo']?.apply(but probably a preferred way of doing so) is to use:df['animal'].replace([r'^(hippo|cow)$', 'ee', '.*'], [1, 0, -1], regex=True)