1

I am trying to group a dataframe by certain columns and then for each group, pass its column series as a list to a custom function or lambda and get a single aggregated result.

Here's a df:

orgid.      appid.  p.  type.   version
-------------------------------------------------
24e78b      4ef36d  1   None    3.3.7
24e78b      4ef36d  2   None    3.4.1
24e78b      4ef36d  1   None    3.3.7-beta-1
24e78b      4ef36d  1   None    3.4.0-mvn.1
24e78b      4ef36d  2   None    3.4.0-beta.5
24e78b      4ef36d  1   None    3.4.0-beta.1
24e78b      4ef36d  1   None    3.4.0
24e78b      4ef36d  1   None    3.3.5

So I have a function that takes a list of versions and returns a max version string.

>> versions = ['3.4.0-mvn.1', '3.4.0-beta.1', '3.4.0', '3.3.7-beta-1', '3.3.7', '3.3.5', '3.4.0-beta-1']
>> str(max(map(semver.VersionInfo.parse, versions)))
'3.4.0'

Now I want to group the dataframe and then each group's version series is passed to this function as a list and return a single version string.

I tried:

>> g = df.groupby(['orgid', 'appid', 'p', 'type'])
>> g['version'].apply(lambda x: str(max(map(semver.VersionInfo.parse, x.tolist()))))
Series([], Name: version, dtype: float64)

I get a empty series.

Expected output:

orgid.      appid.  p.  type.   version
24e78b      4ef36d  1   None    3.4.0
24e78b      4ef36d  2   None    3.4.1

I am also referencing this Pandas group by multiple custom aggregate function on multiple columns post here.

But couldn't get it right.

1
  • can you give reproducible df? Commented Sep 12, 2022 at 20:33

3 Answers 3

2

Try:

import semver

df["version"] = df["version"].apply(semver.VersionInfo.parse)
out = df.groupby(["orgid", "appid", "p", "type"], as_index=False).max()

print(out)

Prints:

    orgid   appid  p  type version
0  24e78b  4ef36d  1  None   3.4.0
1  24e78b  4ef36d  2  None   3.4.1
Sign up to request clarification or add additional context in comments.

1 Comment

this still prints me the Empty dataframe. :(
1

This happens because of the None values in the column you are passing to the groupby method.

Try to do:

df = df.fillna('None')

Before calling df.groupby(...), it should work.

Comments

1
out = (df.groupby(['orgid', 'appid', 'p', 'type'], as_index=False)['version']
         .agg(lambda x: max(semver.VersionInfo.parse(v) for v in x)))
print(out)

# Output:

    orgid   appid  p  type version
0  24e78b  4ef36d  1  None   3.4.0
1  24e78b  4ef36d  2  None   3.4.1

1 Comment

but this still prints me the Empty dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.