Replacing dataframe values after removing/replacing character in rows using Pandas

Question

I have a dataframe df_in like so:

import pandas as pd
import numpy as np
dic_in = {'A':['aa','bb','cc','dd','ee','ff','gg','uu','xx','yy','zz'],
       'B':['200','200','AA200','AA040',np.nan,'500',np.nan,'0700','900','UKK','200'],
       'C':['UNN','400',np.nan,'AA080','AA800','B',np.nan,'400',np.nan,'500','UKK']}

My goal is to investigate column B and C in such a way that:

If one of the items contains the following character 'AA', then the number such part of the string must be removed leaving only the numeric part. (AA123 ---> 123). If a zeros are present before the first non null element, they must be removed (AA001234 ---> 1234).
if the quantity is not a number then it must be set to 0.0 (NaN ---> 0.0, UNN ----> 0.0, UKK ---> 0.0 and so on).
if an item has leading zeros before, then they must be deleted (070--->700, 00007000--->7000)
If an item has been modified and is non-zero then it must be multiplied by 100.

The final result should look like this:

   # BEFORE #                     # AFTER #
     A      B      C               A      B      C
0   aa    200    UNN          0   aa    200    0.0
1   bb    200    400          1   bb    200    400
2   cc  AA200    NaN          2   cc  20000    0.0
3   dd  AA040  AA080          3   dd   4000   8000
4   ee    NaN  AA800          4   ee    0.0  80000
5   ff    500      B          5   ff    500    0.0
6   gg    NaN    NaN          6   gg    0.0    0.0
7   uu   0700    400          7   uu    700    400
8   xx    900    NaN          8   xx    900    0.0
9   yy    UKK    500          9   yy    0.0    500
10  zz    200    UKK          10  zz    200    0.0

Do you know a smart and efficient way to achieve such goal?

Notice: all the numbers are in reality string and they should remain as so.

jezrael · Accepted Answer · 2016-12-05 14:59:20Z

You can use to_numeric for replace not numeric to NaN.

Then extract numbers from strings, remove 0 from left by lstrip and add 00.

Last combine_first with fillna and assign to columns:

b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')

b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'

df_in.B = b.combine_first(b1).fillna(0)
df_in.C = c.combine_first(c1).fillna(0)
print (df_in)
     A      B      C
0   aa    200      0
1   bb    200    400
2   cc  20000      0
3   dd   4000   8000
4   ee      0  80000
5   ff    500      0
6   gg      0      0
7   uu    700    400
8   xx    900      0
9   yy      0    500
10  zz    200      0

A bit modified solution last fillna by string 0.0 convert all values to strings (avoid some strings and some numeric values):

b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')

b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'

df_in.B = b.combine_first(b1)
df_in.C = c.combine_first(c1)

df_in = df_in.fillna('0.0').astype(str)
print (df_in)
     A      B      C
0   aa  200.0    0.0
1   bb  200.0  400.0
2   cc  20000    0.0
3   dd   4000   8000
4   ee    0.0  80000
5   ff  500.0    0.0
6   gg    0.0    0.0
7   uu  700.0  400.0
8   xx  900.0    0.0
9   yy    0.0  500.0
10  zz  200.0    0.0

bunji · Accepted Answer · 2016-12-05 15:05:30Z

Assuming that all the values in your dataframe are strings (including the NaNs, otherwise you can convert them to an appropriate string with fillna), you can use the following converter function with applymap on the two columns you want to convert.

df = pd.DataFrame(dic_in, dtype=str).fillna('NAN')

converter = lambda x: str(int(x.replace('AA', ''))*100) if 'AA' in x else str(int(x)) if x.isdigit() else '0.0'

df[['B','C']] = df[['B','C']].applymap(converter)

contents of df:

     A      B      C
0   aa    200    0.0
1   bb    200    400
2   cc  20000    0.0
3   dd   4000   8000
4   ee    0.0  80000
5   ff    500    0.0
6   gg    0.0    0.0
7   uu    700    400
8   xx    900    0.0
9   yy    0.0    500
10  zz    200    0.0

Collectives™ on Stack Overflow

Replacing dataframe values after removing/replacing character in rows using Pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related