1

I need to find a way of replacing a substring in a pandas dataframe at row level based on condition. (df = pd.DataFrame({'Name':['Meat 1.7 Kg','Chicken 1.9 Kg','Ground Beef 1.0 Kg','Turkey 1.2 kg','Wagyu 400 g'],'Weight':[10,8,2,6,4],'Mult':[4.0,5.2,5.6,5.9,4.9]}))

    Name               Weight Mult
0   Meat 1.7 Kg        10     4.0
1   Chicken 1.9 Kg      8     5.2
2   Ground Beef 1.0 Kg  2     5.6
3   Turkey 1.2 kg       6     5.9
4   Wagyu 400 g         4     4.9

I need to replace the 'Name' column number with the 'Mult' column number. The problem is that I need to directly do so when it has 'Kg' or 'kg' at the end, but when it has a 'g' at the end I need to multiply by 1000 and then replace it.

Desired output:

    Name               Weight Mult
0   Meat 4.0 Kg        10     4.0
1   Chicken 5.2 Kg      8     5.2
2   Ground Beef 5.6 Kg  2     5.6
3   Turkey 5.9 kg       6     5.9
4   Wagyu 4900 g        4     4.9

Maybe it's better to just transform the 'Mult' value prior replacement and then use the same condition to all rows.

Currently I'm trying this: [re.sub(r'(\d+\.\d+)\s+[Kk]g$|(\d+\s+)g$', b, a) for a, b in zip(df['refid'], df['Mult'].astype(str))], but I don't know how to introduce the condition.

Thanks.

1 Answer 1

1

You can use

df['id'] = df.apply(lambda x: re.sub(r'(?i)(\d+(?:\.\d+)?)(\s*(k)?g)$', lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) + z.group(2), x['id']), axis=1)

Output of df:

                     id  Weight  Mult
0           Meat 4.0 Kg      10   4.0
1        Chicken 5.2 Kg       8   5.2
2  Grounded Beef 5.6 Kg       2   5.6
3         Turkey 5.9 kg       6   5.9
4          Wagyu 4900 g       4   4.9

Details:

  • (?i) - case insensitive matching on
  • (\d+(?:\.\d+)?) - Group 1: one or more digits and then an optional sequence of a . and one or more digits
  • (\s*(k)?g) - Group 2: zero or more whitespaces, an optional Group 3 capturing a k and then a g
  • $ - end of string.

The lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) + z.group(2) replacements means the match is replaced with the value from the Mult column as is if Group 3 matched + the Group 2 value, or, if Group 3 did not match, the number is multiplied by 1000, turned to integer, converted to a string and returned with Group 2 value right after.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.