Replace pandas dataframe substrings using regex based on condition at row level

Question

I need to find a way of replacing a substring in a pandas dataframe at row level based on condition. (df = pd.DataFrame({'Name':['Meat 1.7 Kg','Chicken 1.9 Kg','Ground Beef 1.0 Kg','Turkey 1.2 kg','Wagyu 400 g'],'Weight':[10,8,2,6,4],'Mult':[4.0,5.2,5.6,5.9,4.9]}))

    Name               Weight Mult
0   Meat 1.7 Kg        10     4.0
1   Chicken 1.9 Kg      8     5.2
2   Ground Beef 1.0 Kg  2     5.6
3   Turkey 1.2 kg       6     5.9
4   Wagyu 400 g         4     4.9

I need to replace the 'Name' column number with the 'Mult' column number. The problem is that I need to directly do so when it has 'Kg' or 'kg' at the end, but when it has a 'g' at the end I need to multiply by 1000 and then replace it.

Desired output:

    Name               Weight Mult
0   Meat 4.0 Kg        10     4.0
1   Chicken 5.2 Kg      8     5.2
2   Ground Beef 5.6 Kg  2     5.6
3   Turkey 5.9 kg       6     5.9
4   Wagyu 4900 g        4     4.9

Maybe it's better to just transform the 'Mult' value prior replacement and then use the same condition to all rows.

Currently I'm trying this: [re.sub(r'(\d+\.\d+)\s+[Kk]g$|(\d+\s+)g$', b, a) for a, b in zip(df['refid'], df['Mult'].astype(str))], but I don't know how to introduce the condition.

Thanks.

Wiktor Stribiżew · Accepted Answer · 2022-02-26 23:03:04Z

You can use

df['id'] = df.apply(lambda x: re.sub(r'(?i)(\d+(?:\.\d+)?)(\s*(k)?g)$', lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) + z.group(2), x['id']), axis=1)

Output of df:

                     id  Weight  Mult
0           Meat 4.0 Kg      10   4.0
1        Chicken 5.2 Kg       8   5.2
2  Grounded Beef 5.6 Kg       2   5.6
3         Turkey 5.9 kg       6   5.9
4          Wagyu 4900 g       4   4.9

Details:

(?i) - case insensitive matching on
(\d+(?:\.\d+)?) - Group 1: one or more digits and then an optional sequence of a . and one or more digits
(\s*(k)?g) - Group 2: zero or more whitespaces, an optional Group 3 capturing a k and then a g
$ - end of string.

The lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) + z.group(2) replacements means the match is replaced with the value from the Mult column as is if Group 3 matched + the Group 2 value, or, if Group 3 did not match, the number is multiplied by 1000, turned to integer, converted to a string and returned with Group 2 value right after.

Collectives™ on Stack Overflow

Replace pandas dataframe substrings using regex based on condition at row level

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related