Aggregate Pandas DataFrame based on condition that uses multiple columns?

Question

import pandas as pd

data = {
    "K": ["A", "A", "B", "B", "B"],
    "LABEL": ["X123", "X123", "X21", "L31", "L31"],
    "VALUE": [1, 3, 1, 2, 5.0]
}

df = pd.DataFrame.from_dict(data)

output = """
   K LABEL  VALUE
0  A   X12    1.0
1  A   X12    3.0
2  B   X21    1.0
3  B   L31    2.0
4  B   L31    5.0
"""

Transformation steps

For each group ( grouped by K ), find FINAL_VALUE defined below.

Where LABEL are or two types X__ and L__

# if LABEL is X___ then FINAL_VALUE = sum(VALUE)
# if LABEL is L___ then FINAL_VALUE = count(VALUE)
# else FINAL_VALUE = 0

Result of transformation

expected_output = """
K  LABEL  FINAL_VALUE
A    X12            4
B    X21            1
B    L31            2
"""

How can I achieve this using Pandas ?

EDIT1: Partially working

In [17]: df.groupby(["K", "LABEL"]).agg({"VALUE": {"VALUE_SUM": "sum", "VALUE_COUNT": "count"}})
Out[17]: 
              VALUE          
        VALUE_COUNT VALUE_SUM
K LABEL                      
A X12             2       4.0
B L31             2       7.0
  X21             1       1.0

EDIT2: Using reset_index() to fill up the dataframe

In [18]: df2 = df.groupby(["K", "LABEL"]).agg({"VALUE": {"VALUE_SUM": "sum", "VALUE_COUNT": "count"}})

In [21]: df2.reset_index()
Out[21]: 
   K LABEL       VALUE          
           VALUE_COUNT VALUE_SUM
0  A   X12           2       4.0
1  B   L31           2       7.0
2  B   X21           1       1.0

EDIT3: Final solution using df.apply()

In [59]: df3 = df2.reset_index()

In [60]: df3["FINAL_VALUE"] = df3.apply(lambda x: x["VALUE"]["VALUE_SUM"] if x["LABEL"].str.startswith("X").any() else x["VALUE"]["VALUE_COUNT"] , axis=1)

In [61]: df3[["K", "LABEL", "FINAL_VALUE"]]
Out[61]: 
   K LABEL FINAL_VALUE

0  A   X12         4.0
1  B   L31         2.0
2  B   X21         1.0

@vlad.rad Not yet :-) I am almost there. I need to get the exact columns. — tuxdna
– tuxdna, Commented Sep 8, 2016 at 14:53

Nickil Maveli · Accepted Answer · 2016-09-08 15:27:46Z

1

You could use DFGroupby.agg like you have done before followed by writing a generic function which computes the necessary requirements with the help of str.startswith and returns the required frame as shown:

def compute_multiple_condition(row):
    if row['LABEL'].startswith('X'):
        return row['sum']
    elif row['LABEL'].startswith('L'):
          return row['count']
    else:
        return 0

df = df.groupby(['K','LABEL'])['VALUE'].agg({'sum': 'sum', 'count': 'count'}).reset_index()
df['FINAL_VALUE'] = df.apply(compute_multiple_condition, axis=1).astype(int)
df = df[['K', 'LABEL', 'FINAL_VALUE']]
df

   K LABEL  FINAL_VALUE
0  A   X12            4
1  B   L31            2
2  B   X21            1

edited Sep 8, 2016 at 15:27

answered Sep 8, 2016 at 15:02

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tuxdna Over a year ago

For me startswith gave the error: AttributeError: ("'Series' object has no attribute 'startswith'", u'occurred at index 0'). But you gave the right approach.

Nickil Maveli Over a year ago

You need to use it along with the str accessor like series.str.startswith(). In my function, it is calculated on individual strings and not on the entire series as such and hence the accessor isn't required.

Nickil Maveli Over a year ago

Also, check the dtypes. LABEL column should be of type object for it to work.

PhilChang · Accepted Answer · 2016-09-08 15:42:16Z

0

you can try data frame chain:

result = (df.groupby(['K', 'LABEL'])
            .apply(lambda frame: frame.VALUE.sum() 
                                if frame.LABEL.iloc[0].startswith("X") else len(frame))
            .to_frame()
            .rename({'0': 'FINAL_VALUE'})
         )

answered Sep 8, 2016 at 15:42

PhilChang

2,7011 gold badge18 silver badges18 bronze badges

Collectives™ on Stack Overflow

Aggregate Pandas DataFrame based on condition that uses multiple columns?

Transformation steps

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Transformation steps

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related