Operate on columns in pandas groupby

Question

Assume I have a dataframe df which has 4 columns col = ["id","date","basket","gender"] and a function

def is_valid_date(df):
         idx = some_scalar_function(df["basket") #returns an index
         date = df["date"].values[idx]
         return (date>some_date)

I have always understood the groupby as a "creation of a new dataframe" when splitting in the "split-apply-combine" (losely speaking) thus if I want to apply is_valid_date to each group of id, I would assume I could do

df.groupby("id").agg(get_first_date)

but it throws KeyError: 'basket' in the idx=some_scalar_function(df["basket"])

How working df.groupby("id").apply(get_first_date) ?

jezrael
– jezrael

2020-06-30 08:06:06 +00:00
Commented Jun 30, 2020 at 8:06 — jezrael
– jezrael, Commented Jun 30, 2020 at 8:06
Wow... Of course!!!

CutePoison
– CutePoison

2020-06-30 08:09:55 +00:00
Commented Jun 30, 2020 at 8:09 — CutePoison
– CutePoison, Commented Jun 30, 2020 at 8:09

jezrael · Accepted Answer · 2020-06-30 08:11:35Z

2

If use GroupBy.agg it working with each column separately, so cannot selecting like df["basket"], df["date"].

Solution is use GroupBy.apply with your custom function:

df.groupby("id").apply(get_first_date)

answered Jun 30, 2020 at 8:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CutePoison Over a year ago

Might be one of the most stupid mistakes I've ever made....

jezrael Over a year ago

@CutePoison - I think not, it is not mistake. This functions are similar and sometimes working same.

Collectives™ on Stack Overflow

Operate on columns in pandas groupby

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related