0

I use the following dataframe

df = pd.DataFrame({'class': 'a a aa aa b b '.split(),
                    'item': [5,5,7,7,7,6],
                   'last_PO_code': ['103','103','103','104','103','104'],
                   'qty': [5,4,7,6,7,6]
                   })

I need to apply rules to this dataframe for each class in each item.

  1. true if all last_PO_code are equal to 103
  2. true if last_PO_code contains 103 and 104 and sum of qty 103 > sum qty of 104
  3. true if there is a last_PO_code equal to 103 and 104 and 105 and 106 and the sum of the qty of 104 == 103 and 105 == 106

I have written lambda functions that I can't use with transform

regle1 = lambda x: True if x['last_PO_code'].all() == "103" else False
regle2 = lambda x: x.loc[x['last_PO_code'].eq('103'), 'qty'].sum() \
                   > x.loc[x['last_PO_code'].eq('104'), 'qty'].sum()
regle3 = lambda x: x.loc[x['last_PO_code'].eq('105'), 'qty'].sum() \
                   == x.loc[x['last_PO_code'].eq('106'), 'qty'].sum()

df['regle1'] = df['class'].map(df.groupby(['class','item']).apply(regle1))
df['regle2'] = df['class'].map(df.groupby(['class','item']).apply(regle2))
df['regle3'] = df['class'].map(df.groupby(['class','item']).apply(regle3))
mask1 = df['regle2'] == True 
mask2 = df['regle3'] == True 
mask = mask1 & mask2
df['regle3'] = np.where(mask,True,False)


which I would like to transform into a function like the following to use transform and not apply

I succeeded with rule 1 but I can't manage with the other rules

def regle1(x):
      return (x == '103').all()


df['regle1'] = df.groupby(['class', 'item']).last_PO_code.transform(regle1)

1 Answer 1

1

You mean something like that:

regle1 = lambda x: True if x['last_PO_code'].eq('103').all() else False
regle2 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('103').sum() > x['last_PO_code'].eq('104').sum() \
    else False
regle3 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('105').any() \
    and x['last_PO_code'].eq('106').any() \
    and x['last_PO_code'].eq('103').sum() == x['last_PO_code'].eq('104').sum() \
    and x['last_PO_code'].eq('105').sum() == x['last_PO_code'].eq('106').sum() \
    else False

And then applying them to each group:

df2 = df.groupby(['class','item']).apply(lambda x: pd.Series({'regle1' : regle1(x),
                                  'regle2': regle2(x),
                                  'regle3' : regle3(x)}))

for

df = pd.DataFrame({'class': 'a a aa aa b b c c c c'.split(),
                    'item': [5,5,7,7,7,6,9,9,9,9],
                   'last_PO_code': ['103','103','103','104','103','104','103','104','105','106'],
                   'qty': [5,4,7,6,7,6,1,1,2,2]
                   })

It seems to working fine:

                regle1  regle2  regle3
class   item            
a       5       True    False   False
aa      7       False   True    False
b       6       False   False   False
        7       True    False   False
c       9       False   False   True

EDIT: You can add calculated columns for example with pd.merge()

df.merge(df2.reset_index(), on = ['class','item'])

#   class   item    last_PO_code    qty regle1  regle2  regle3
#0  a       5       103             5   True    True    False
#1  a       5       103             4   True    True    False
#2  aa      7       103             7   False   False   False
#3  aa      7       104             6   False   False   False
#4  b       7       103             7   True    True    False
#5  b       6       104             6   False   False   False
#6  c       9       103             1   False   False   True
#7  c       9       104             1   False   False   True
#8  c       9       105             2   False   False   True
#9  c       9       106             2   False   False   True
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer but the resulting dataframe is not good because it does not contain all the resulting rows
OK, but once you have it is easy to add them to start dataset (either via merge, or map). I thought that the biggest problem were the lambda functions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.