2

I have a dataframe like so.

   id   K  V
0   1  k1  3
1   1  k2  4
2   1  k2  5
3   1  k1  5
4   2  k1  2
5   2  k1  3
6   2  k2  3

And i also have a set of conditions like k1 > 1 and k2 < 4.

I want to process the conditions and create a new dataframe containing 1 row per id and columns for each conditions.

   id  k1_condition  k2_condition
0   1  True          False
1   2  True          True
1

4 Answers 4

2

Try the following

# function to be applied to the column 'V 'each (id, ki) group
def conditions(g):
    cond_dict = {
        'k1': lambda k1: k1 > 1,
        'k2': lambda k2: k2 < 4
    }
    _ , k = g.name  # g.name = group key = (id, ki)
    return cond_dict[k](g).all()

out = (
    df.groupby(['id', 'K'])['V']
      .apply(conditions) 
      .unstack('K') # turn k1 and k2 into columns 
      .add_suffix('_cond') # add suffix to column names: ki --> ki_cond
      .rename_axis(columns=None) # remove the column axis label (K)
      .reset_index() # make id a column, not the index       
)

Output:

>>> out

   id  k1_cond  k2_cond
0   1     True    False
1   2     True     True

You can easily add more conditions to the cond_dict if the column K contains other values besides k1 and k2.

Sign up to request clarification or add additional context in comments.

Comments

1

here is one way to do it

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2.columns = ['_'.join(col) for col in df2.columns ]
df2['k1_condition'] = df2['min_k1'] > 1
df2['k2_condition'] = df2['max_k2'] <4
df2=df2.drop(columns=['min_k1','min_k2','max_k1','max_k2'])
df2

OR

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2['k1_condition'] = df2['min']['k1'] > 1
df2['k2_condition'] = df2['max']['k2'] <4
df2.drop(columns=['min','max'],level=0,inplace=True)
df2

    id_     k1_condition    k2_condition
0   1       True            False
1   2       True            True

Comments

1

Dataframe.apply should work:

df["k1_condition"] = df.apply(lambda x: x["K"]=="k1" & x["V"]>1, axis=1)
df["k2_condition"] = df.apply(lambda x: x["K"]=="k2" & x["V"]>4, axis=1)
df2 = df[["id", "k1_condition", "k2_condition"]].groupy("id").any()

2 Comments

this will work but i also have to aggregate so that i have one row per id
@I a s: That should do the trick
1

You could use pivot_table with a conditions function.

def conditions(x):
    k = df.at[x.index[0],'K']

    if k == 'k1':
        return (x>1).all()
    
    return (x<4).all()

pd.pivot_table(df, index='id', columns='K', aggfunc=conditions) \
    .droplevel(level=0, axis=1).add_suffix('_condition') \
    .rename_axis(None, axis=1).reset_index()  

Result

   id   k1_conditon  k2_condition
0   1          True         False
1   2          True          True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.