Return multiple objects from an apply function in Pandas

Question

I'm practicing with using apply with Pandas dataframes.

So I have cooked up a simple dataframe with dates, and values:

dates = pd.date_range('2013',periods=10)
values = list(np.arange(1,11,1))
DF = DataFrame({'date':dates, 'value':values})

I have a second dataframe, which is made up of 3 rows of the original dataframe:

DFa = DF.iloc[[1,2,4]]

So, I'd like to use the 2nd dataframe, DFa, and get the dates from each row (using apply), and then find and sum up any dates in the original dataframe, that came earlier:

def foo(DFa, DF=DF):
    cutoff_date = DFa['date']
    ans=DF[DF['date'] < cutoff_date]

DFa.apply(foo, axis=1)

Things work fine. My question is, since I've created 3 ans, how do I access these values?

Obviously I'm new to apply and I'm eager to get away from loops. I just don't understand how to return values from apply.

I don't think apply is best option for this. If I understand correctly why not DFa[DF.index].sum()? — pbreach
– pbreach, Commented Jun 11, 2015 at 0:56
I agree, it's a pretty lousy example. My main problem is trying to return from the apply. I would really like to see how I could return 3 different dataframes, and sum them up elsewhere (but I didn't mention that in the question appropriately). — tumultous_rooster
– tumultous_rooster, Commented Jun 11, 2015 at 1:56
That's okay, it's possible that groupby might be a better alternative to look into. You can specify groups for the 3 subsets then simply use the sum method on the resulting groupby object. — pbreach
– pbreach, Commented Jun 11, 2015 at 2:30
@MattO'Brien: The performance of DF.apply(func, axis=1) is comparable to calling func in a loop. apply is useful when you want to align the output into a single DataFrame. If you need to return 3 disparate DataFrames, go ahead and loop over DF.iterrows(). For better performance you'll have to think of a better way to calculate the result (such as doing a sorted cumsum for the toy example above) or perhaps use Cython. — unutbu
– unutbu, Commented Jun 11, 2015 at 11:42

aensm · Accepted Answer · 2015-06-10 21:33:07Z

1

Your function needs to return a value. E.g.,

def foo(df1, df2):
    cutoff_date = df1.date
    ans = df2[df2.date < cutoff_date].value.sum()
    return ans


DFa.apply(lambda x: foo(x, DF), axis=1)

Also, note that apply returns a DataFrame. So your current function would return a DataFrame for each row in DFa, so you would end up with a DataFrame of DataFrames

answered Jun 10, 2015 at 21:33

aensm

3,66710 gold badges38 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tumultous_rooster Over a year ago

So hypothetically, what if I actually wanted to return a dataframe of dataframes? DF_of_DFs = DFa.apply(lambda x: foo(x, DF), axis=1) doesn't seem to be appropriate...

Ami Tavory · Accepted Answer · 2015-06-10 21:54:47Z

1

There's a bit of a mixup the way you're using apply. With axis=1, foo will be applied to each row (see the docs), and yet your code implies (by the parameter name) that its first parameter is a DataFrame.

Additionally, you state that you want to sum up the original DataFrame's values for those less than the date. So foo needs to do this, and return the values.

So the code needs to look something like this:

def foo(row, DF=DF):
    cutoff_date = row['date']
    return DF[DF['date'] < cutoff_date].value.sum()

Once you make the changes, as foo returns a scalar, then apply will return a series:

>> DFa.apply(foo, axis=1)
1     1
2     3
4    10
dtype: int64

edited Jun 10, 2015 at 21:54

answered Jun 10, 2015 at 21:45

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Collectives™ on Stack Overflow

Return multiple objects from an apply function in Pandas

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related