2

Assume I have a dataframe df which has 4 columns col = ["id","date","basket","gender"] and a function

def is_valid_date(df):
         idx = some_scalar_function(df["basket") #returns an index
         date = df["date"].values[idx]
         return (date>some_date)

I have always understood the groupby as a "creation of a new dataframe" when splitting in the "split-apply-combine" (losely speaking) thus if I want to apply is_valid_date to each group of id, I would assume I could do

df.groupby("id").agg(get_first_date)

but it throws KeyError: 'basket' in the idx=some_scalar_function(df["basket"])

2
  • How working df.groupby("id").apply(get_first_date) ? Commented Jun 30, 2020 at 8:06
  • Wow... Of course!!! Commented Jun 30, 2020 at 8:09

1 Answer 1

2

If use GroupBy.agg it working with each column separately, so cannot selecting like df["basket"], df["date"].

Solution is use GroupBy.apply with your custom function:

df.groupby("id").apply(get_first_date)
Sign up to request clarification or add additional context in comments.

2 Comments

Might be one of the most stupid mistakes I've ever made....
@CutePoison - I think not, it is not mistake. This functions are similar and sometimes working same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.