python pandas dataframe aggregate rows

Question

I have a dataframe like so.

   id   K  V
0   1  k1  3
1   1  k2  4
2   1  k2  5
3   1  k1  5
4   2  k1  2
5   2  k1  3
6   2  k2  3

And i also have a set of conditions like k1 > 1 and k2 < 4.

I want to process the conditions and create a new dataframe containing 1 row per id and columns for each conditions.

   id  k1_condition  k2_condition
0   1  True          False
1   2  True          True

Have you tried apply? pandas.pydata.org/docs/reference/api/… — Chris
– Chris, Commented Jul 2, 2022 at 15:33

Rodalm · Accepted Answer · 2022-07-02 17:01:15Z

2

Try the following

# function to be applied to the column 'V 'each (id, ki) group
def conditions(g):
    cond_dict = {
        'k1': lambda k1: k1 > 1,
        'k2': lambda k2: k2 < 4
    }
    _ , k = g.name  # g.name = group key = (id, ki)
    return cond_dict[k](g).all()

out = (
    df.groupby(['id', 'K'])['V']
      .apply(conditions) 
      .unstack('K') # turn k1 and k2 into columns 
      .add_suffix('_cond') # add suffix to column names: ki --> ki_cond
      .rename_axis(columns=None) # remove the column axis label (K)
      .reset_index() # make id a column, not the index       
)

Output:

>>> out

   id  k1_cond  k2_cond
0   1     True    False
1   2     True     True

You can easily add more conditions to the cond_dict if the column K contains other values besides k1 and k2.

answered Jul 2, 2022 at 17:01

Rodalm

5,7589 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Naveed · Accepted Answer · 2022-07-02 15:56:30Z

1

here is one way to do it

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2.columns = ['_'.join(col) for col in df2.columns ]
df2['k1_condition'] = df2['min_k1'] > 1
df2['k2_condition'] = df2['max_k2'] <4
df2=df2.drop(columns=['min_k1','min_k2','max_k1','max_k2'])
df2

OR

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2['k1_condition'] = df2['min']['k1'] > 1
df2['k2_condition'] = df2['max']['k2'] <4
df2.drop(columns=['min','max'],level=0,inplace=True)
df2


    id_     k1_condition    k2_condition
0   1       True            False
1   2       True            True

edited Jul 2, 2022 at 15:56

answered Jul 2, 2022 at 15:48

Naveed

11.7k2 gold badges16 silver badges21 bronze badges

Comments

Chris · Accepted Answer · 2022-07-02 16:05:55Z

1

Dataframe.apply should work:

df["k1_condition"] = df.apply(lambda x: x["K"]=="k1" & x["V"]>1, axis=1)
df["k2_condition"] = df.apply(lambda x: x["K"]=="k2" & x["V"]>4, axis=1)
df2 = df[["id", "k1_condition", "k2_condition"]].groupy("id").any()

edited Jul 2, 2022 at 16:05

answered Jul 2, 2022 at 15:37

Chris

6381 gold badge5 silver badges19 bronze badges

2 Comments

l a s Over a year ago

this will work but i also have to aggregate so that i have one row per id

Chris Over a year ago

@I a s: That should do the trick

sitting_duck · Accepted Answer · 2022-07-03 18:14:31Z

1

You could use pivot_table with a conditions function.

def conditions(x):
    k = df.at[x.index[0],'K']

    if k == 'k1':
        return (x>1).all()
    
    return (x<4).all()

pd.pivot_table(df, index='id', columns='K', aggfunc=conditions) \
    .droplevel(level=0, axis=1).add_suffix('_condition') \
    .rename_axis(None, axis=1).reset_index()

Result

   id   k1_conditon  k2_condition
0   1          True         False
1   2          True          True

edited Jul 3, 2022 at 18:14

answered Jul 2, 2022 at 16:21

sitting_duck

3,7901 gold badge17 silver badges20 bronze badges

Collectives™ on Stack Overflow

python pandas dataframe aggregate rows

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related