0

I need to aggregate several columns by one column. I have the following code that works but for now column and I am struggling to modify it to several columns.

import pandas as pd

# Sample DataFrame
data = {
    'Group': ['A', 'A', 'B', 'B', 'A', 'B'],
    'Value': [1, 2, 3, 4, 5, 6],
    'Qty': [100, 202, 403, 754, 855, 1256]
}
df = pd.DataFrame(data)
print (df)
result = df.groupby('Group')['Value'].apply(lambda x: pd.Series([', '.join(map(str, x))])).reset_index()
print(result)

This produces a table with the column "Group" (the groupby) and one column for "Value", but I need another column with the aggregate output for the variable Qty. Actually, my dataset has 12 variables that I need to aggregate. Any suggestion?

Thank you in advance and Happy 2024!!

2
  • 1
    Have you heard of pandas' built-in df.groupby(...).agg(...) function? Read its docs here: DataFrameGroupBy.agg Commented Dec 28, 2023 at 16:43
  • 1
    df.astype(str).groupby('Group', as_index=False).agg(', '.join) Commented Dec 28, 2023 at 17:52

1 Answer 1

2

To do it with many columns and in a more practical way, loop over all columns that are not "Group", this would give you practicality if you have a lot of variables:

aggregated_data = df.groupby('Group').agg({col: concatenate_with_comma for col in df.columns if col != 'Group'})

As for the usage of the concatenate_with_coma, here it is:

def concatenate_with_comma(series):
    return ', '.join(map(str, series))

FYI you can use "sum" and other functions instead, if this is your goal eventually. The result is like this:

         Value             Qty
Group                         
A      1, 2, 5   100, 202, 855
B      3, 4, 6  403, 754, 1256
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.