0

I have a list of units I want to search in a pandas dataframe and then convert those units into correctly named units and also multiply their values with a constant factor in list below. Here is the example dataframe

>> df
product                     info
product___A     3.5 m mini-jack
product___B     3.5 kg mini-jack
product___C     3.5mm mini-jack
product___D     3.5 millimeter mini-jack
product___E     43 centimeter mini-jack

Here is my implementation of the code

import re
import pandas as pd
units_origianal = ['Kilogram', 'millimeter', 'pounds', 'ounce', 'centimeter', 'kilometers']
units = ['kg', 'mm', 'lbs' 'oz', 'cm', 'm']
factor = [0.543, 654.53, 53.64,0.744, 43.8, 98.123]
def norm_units(x):
    for i in range(len(units)):
        if ('\d+\s'+units_origianal[i] in x or re.search('\d+'+units_origianal[i],str(x))):
            quantity = re.findall("\d+\.\d+", str(x))[0]
            resulting_quantity = float(quantity) * factor[i]
                return x.replace(quantity, resulting_quantity).replace(units_origianal[i], units[i])

df = df.apply(norm_units)


>> df
    # Expected resulting Dataframe
product                     info
product___A      344.05 m mini-jack
product___B      1.9005 kg mini-jack
product___C      2290.155 mm mini-jack
product___D      2290.155 mm mini-jack
product___E      1883.4 cm mini-jack

The resulting dataframe i got after running the code

  product info
0  None  None
1  None  None
2  None  None
3  None  None
4  None  None

Appreciations and thanks for help in advance.

1 Answer 1

2

You may want to use str.replace with regular expression groups

>> factors = {'Kilogram': 0.543, 'kg': 0.543, 
              'millimeter': 654.54, 'mm': 654.54,
              'pounds': 53.64, 'lbs': 53.64,
              'ounce': 0.744, 'oz': 0.744,
              'centimeter': 43.8, 'cm': 43.8, 
              'kilometers': 98.123, 'm': 98.123}
>> pat = "(?P<val>\d+\.?\d?)\s*(?P<unit>(%s))" % '|'.join(factors)
>> def repl(p):
>>     val, unit = float(p.group('val')), p.group('unit')
>>     return str(factors[unit] * val) + ' ' + unit
>> df['info'] = df['info'].str.replace(pat, repl)
>> df

    product      info
0   product___A  343.4305 m mini-jack
1   product___B  1.9005 kg mini-jack
2   product___C  2290.89 mm mini-jack
3   product___D  2290.89 millimeter mini-jack
4   product___E  1883.3999999999999 centimeter mini-jack
Sign up to request clarification or add additional context in comments.

4 Comments

Can I apply this to whole dataframe ? I think it will work with only one column. Actually I have messy dataset so I don't know which columns might contain the structure like info column. Also I don't know why the same code is not working for me
@muazfaiz In order to help you with this, please provide in the question an example with input and desired output that you are stuck with. Since with those that you've already provided it works pretty well.
What version of pandas do you have? Callable support for repl parameter was added in version 0.20.0. The version I use is 0.20.1
Thanks. It works now. But what should be done in order to apply this method to whole dataframe ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.