I have a dataframe df_in like so:
import pandas as pd
import numpy as np
dic_in = {'A':['aa','bb','cc','dd','ee','ff','gg','uu','xx','yy','zz'],
'B':['200','200','AA200','AA040',np.nan,'500',np.nan,'0700','900','UKK','200'],
'C':['UNN','400',np.nan,'AA080','AA800','B',np.nan,'400',np.nan,'500','UKK']}
My goal is to investigate column B and C in such a way that:
- If one of the items contains the following character
'AA', then the number such part of the string must be removed leaving only the numeric part. (AA123 ---> 123). If a zeros are present before the first non null element, they must be removed (AA001234 ---> 1234). - if the quantity is not a number then it must be set to
0.0(NaN ---> 0.0,UNN ----> 0.0,UKK ---> 0.0and so on). - if an item has leading zeros before, then they must be deleted (
070--->700,00007000--->7000) - If an item has been modified and is non-zero then it must be multiplied by
100.
The final result should look like this:
# BEFORE # # AFTER #
A B C A B C
0 aa 200 UNN 0 aa 200 0.0
1 bb 200 400 1 bb 200 400
2 cc AA200 NaN 2 cc 20000 0.0
3 dd AA040 AA080 3 dd 4000 8000
4 ee NaN AA800 4 ee 0.0 80000
5 ff 500 B 5 ff 500 0.0
6 gg NaN NaN 6 gg 0.0 0.0
7 uu 0700 400 7 uu 700 400
8 xx 900 NaN 8 xx 900 0.0
9 yy UKK 500 9 yy 0.0 500
10 zz 200 UKK 10 zz 200 0.0
Do you know a smart and efficient way to achieve such goal?
Notice: all the numbers are in reality string and they should remain as so.