1

I'm learning Python and want to use the "apply" function. Reading around the manual I found that if a I have a simple dataframe like this:

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])

   A  B
0  4  9
1  4  9
2  4  9

and then I use something like this:

df.apply(lambda x:x.sum(),axis=0) 

output works because according to theory x receives every column and apply the sum to each so the the result is correctly this:

A    12
B    27
dtype: int64

When instead I issue something like:

df['A'].apply(lambda x:x.sum())

result is: 'int' object has no attribute 'sum'

question is: why is that working on a dataframe by column, it's ok and working on a single column is not ? In the end the logic should be the same. x should receive in input one column instead of two.

I know that for this simple example I should use other functions like df.agg or even df['A'].sum() but the question is to understand the logic of apply.

5
  • 1
    When you use apply on Series, the argument is row element in that Series. Commented May 1, 2022 at 8:44
  • I missed that somewhere in the manual ... :-( thanks ! Commented May 1, 2022 at 8:48
  • 1
    @AndreaGrianti note that you should always avoid to use apply, here df.sum() and df['A'].sum() should be used. Always check whether your can use a vectorial alternative and only fallback to apply if there is really no other way (apply is slow). Commented May 1, 2022 at 9:13
  • Yes I read about that, but considering that I was used to work with for loops which is even slower, apply in general is a step ahead anyway afai can see. but thanks for your suggestion. Commented May 1, 2022 at 9:21
  • 1
    @Andrea actually, python loops (for or list comprehensions), are usually faster that apply ;) Commented May 1, 2022 at 10:35

1 Answer 1

1

if you look at a specific column of a pandas.DataFrame object, you working with a pandas.Series with (in your case) integers as values. Well and integers don't have a sum() method. (Run type(df['A']) to see that you are working with a series and not a data frame anymore when slicing a single column).

The irritating part is that if you work with an actual pandas.DataFrame object, every column is a pandas.Series object and they have a sum() method.

So there are two ways to fix your problem

  1. Work with a pandas.DataFrame and not with a pandas.Series: df[['A']]. The additional brackets force pandas to return a pandas.DataFrame object. (Verify by type(df[['A']])) and use the lambda function just as you did before
  2. use a function rather than a method when using lambda: df['A'].apply(lambda x: np.sum(x)) (assuming that you have imported numpy as np)

I would recommend to go with the second option as it seems to me the more generic and clearer way

However, this is only relevant if you want to apply a certain function to ever element in a pandas.Series or pandas.DataFrame. In your specific case, there is no need to take the detour that your are currently using. Just use df.sum(axis=0). The approach with apply is over complicating things. The reason why this works is that every element of a pandas.DataFrame is a pandas.Series, which as a sum method. But so does a pandas.DataFrame has, so you can use this right away.

The only way, where you actually need to take the way with apply is if you had arrays in every cell of the pandas.DataFrame

Sign up to request clarification or add additional context in comments.

2 Comments

I tried your second example with df['A'].apply(lambda x: np.sum(x)) but it returns me just the column and not the sum. In any case the first example works and it's interesting.
@AndreaGrianti sorry, I focused too much on explaining your "bug". Actually, you don't need the apply() method and a lambda function. Just use the sum(axis=0) method of a pandas.DataFrame =P

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.