0

I have a dataset with a column like this:


0        534K
1      482.2K
2      491.3K
3      247.6K
4      815.3K
        ...  
144    403.4K
145      1.3M
146      3.1M
147    311.5K
148      3.7M

I'm trying to clean the column values by writing a function like this (to remove the 'K's and '.' as a test first):


def fix_column(val):
    i = 0
    if val[i][-1] == 'K' and len(val[i]) == 4:
        val.replace("K","")
        val = val + "000"
    elif val[i][-1] == 'K'and len(val[i]) == 5:
        val.replace("K","").replace(".","")
        val = val + "00"
    i += 1
    
    return val

But when I try to apply the function on the column like df.col = df.col.apply(fix_column) nothing happens and the column values remain unchanged. If I do something like df.col = df.col.apply(lambda x:x.replace("K","").replace(".","")) I do get close to a desirable output like so:

0       534
1      4822
2      4913
3      2476
4      8153
       ... 
144    4034
145     13M
146     31M
147    3115
148     37M

I may be missing something basic but would appreciate any pointers in the right direction. Thanks.

1 Answer 1

3

Your first mistake is to use a loop at all. Your second mistake is to use a loop incorrectly. (i is a local variable, its value is always 0.) Here's a way to go:

Separate the number and the decimal suffix (if any):

number_strings = df.col.str.extract('([\d\.]+)([MK]?)')

Convert the number strings into proper numbers:

numbers = number_strings[0].astype(float)

Multiply the numbers, where necessary:

numbers[number_strings[1] == 'K'] *= 1,000
numbers[number_strings[1] == 'M'] *= 1,000,000

Paste the results back into the dataframe:

df.col = numbers
Sign up to request clarification or add additional context in comments.

3 Comments

I was just writing an answer with the same method so glad you did that, it can be perhaps more succinctly put: df.col.apply(lambda x: str(int(float(x[:-1]) * 1000 if x[-1] == 'K' else 1000000 if x[-1] == 'M' else 1)))
@Nick One should stay away from lambda if a non-lambda solution is possible.
Probably true. Depends on circumstance... in this case it would probably be faster because it's not using a regex extract. Anyway, I upvoted...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.