I prefer a syntax close to R's tidyverse, which provides a lot of flexibility together with readability. There you would apply custom functions in the following way:
# Some input data
df = pd.DataFrame({
'col1': [0, 1, 0, 1, 0],
'col2': [10, 20, 30, 40, 50],
'col3': [100, 200, 300, 400, 500],
})
# Tidyverse-like aggregations
(
df
.groupby('col1')
.agg(
percent_col2_above_30=('col2', lambda x: sum(x>30)/len(x)),
col3_max_divided_by_min=('col3', lambda x: max(x)/min(x))
)
)
Sometimes you also want to do calculations across columns, so here is an example of this:
# Example input data
df = pd.DataFrame({
'col1': ["A", "A", "A", "B", "B", "B"],
'col2': ["group_1", "group_2", "group_3", "group_1", "group_2", "group_3"],
'col3': [10, 20, 80, 40, 30, 30],
})
# Perform grouped aggregations with multi-column calculations
df_result = (
df
.groupby("col1")
.apply(
lambda x: pd.Series({
"group_1_prop": x.loc[x["col2"] == "group_1", "col3"].sum() / x["col3"].sum(),
"group_2_prop": x.loc[x["col2"] == "group_2", "col3"].sum() / x["col3"].sum(),
"group_3_prop": x.loc[x["col2"] == "group_3", "col3"].sum() / x["col3"].sum(),
})
)
.reset_index()
)