How to remove the characters from multiple columns in pandas data frame?

Question

I would like to remove the characters from the columns in the pandas' data frame. I got around 10 columns and each has characters. Please see the sample column. Column type is a string and would like to remove the characters and convert the column int float

10.2\I

10.1\Y

NAN

12.5\T

13.3\T

9.4\J

NAN

12.2\N

NAN

11.9\U

NAN

12.4\O

NAN

8.3\U

13.5\B

NAN

13.1\V

11.0\Q

11.0\X

8.200000000000001\U

NAN

13.1\T

8.1\O

9.4\N

I would like to remove the '\', all the Alphabets and make it into a float. I don't want to change the NAN.

I used df[column name'] = df.str[:4] - It removes some of the cells but not all cells. Also, unable to convert into a float as I am getting an error

df[column name'] = df.str[:4]

df['column name'].astype(float)

0     10.2

1     10.1

2      NaN

3     12.5

4     13.3

5     9.4\

6     8.3\

22    8.1\

27    9.4\
28     NaN
29    10.6
30    10.8
31     NaN
32    7.3\
33    9.8\
34     NaN
35    12.4
36    8.1\

Still it's not converting other cells

Getting error when I tried to convert into a float

ValueError: could not convert string to float: '10.2\I'

Hi! It is frankly quite difficult to parse your post. Please format it a bit more clearly use some of the markdown/code-tags etc. available, and then I'm sure people will be happy to help. — moonGoose
– moonGoose, Commented Mar 31, 2019 at 2:59

AlexK · Accepted Answer · 2019-03-31 07:48:35Z

Two reasons I can see why your code is not working:

Using [:4] will not work for all values in your example since the number of digits before the decimal point (and apparently after it) varies.
In the df['column name'] = df.str[:4] assignment there needs to be the same column identifier on the right side of the equal sign.

Here is a solution with a sample dataframe I prepared with two abbreviated columns like in your example. It uses [:-2] to truncate each value from the right side and then replaces remaining N's with the original NAN's before converting to float.

import pandas as pd

col = pd.Series(["10.2\I","10.1\Y",'NAN','12.5\T'])
col2 = pd.Series(["11.0\Q","11.0\X",'NAN',r'8.200000000000001\U'])

df = pd.concat([col,col2],axis=1)
df.rename(columns={0:'col1',1:'col2'},inplace=True)
df

    col1     col2
0   10.2\I   11.0\Q
1   10.1\Y   11.0\X
2   NAN      NAN
3   12.5\T   8.200000000000001\U

#apply the conversion to all columns in the dataframe
for col in df:
    df[col] = df[col].str[:-2].replace('N','NAN').astype(float)

df
    col1    col2
0   10.2    11.0
1   10.1    11.0
2   NaN     NaN
3   12.5    8.2

Collectives™ on Stack Overflow

How to remove the characters from multiple columns in pandas data frame?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related