I'm learning Python and want to use the "apply" function. Reading around the manual I found that if a I have a simple dataframe like this:
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
A B
0 4 9
1 4 9
2 4 9
and then I use something like this:
df.apply(lambda x:x.sum(),axis=0)
output works because according to theory x receives every column and apply the sum to each so the the result is correctly this:
A 12
B 27
dtype: int64
When instead I issue something like:
df['A'].apply(lambda x:x.sum())
result is: 'int' object has no attribute 'sum'
question is: why is that working on a dataframe by column, it's ok and working on a single column is not ? In the end the logic should be the same. x should receive in input one column instead of two.
I know that for this simple example I should use other functions like df.agg or even df['A'].sum() but the question is to understand the logic of apply.
applyon Series, the argument is row element in that Series.apply, heredf.sum()anddf['A'].sum()should be used. Always check whether your can use a vectorial alternative and only fallback toapplyif there is really no other way (applyis slow).apply;)