Error when converting string numbers to float in read_csv

Question

While reading in a csv file (kidney_disease.csv from https://www.kaggle.com/mansoordaku/ckdisease/data), pandas mistakenly assigns the columns pcv, wc and rc the dtype object (should be float). Specifying the dtypes leads to the an error:

data = pd.read_csv(file, usecols=["pcv", "wc", "rc"], 
                   dtype={"pcv": np.float64, "wc": np.float64, "rc": np.float64})

ValueError: could not convert string to float: '\t?'

Can anyone explain to me why this happens? All values in these columns are either strings which correspond to numbers or nan. And is there a possibilty for pandas to "guess" the dtype based on the first 100 rows or something like this?

Thanks alot!

scign · Accepted Answer · 2020-02-03 14:08:18Z

2

The source data file is not clean. You should read in the file first and then parse to float.

import pandas as pd

df = pd.read_csv('kidney_disease.csv')
cols = ['pcv','wc','rc']
df = df[cols]
for col in cols:
    df[col] = pd.to_numeric(df[col],downcast='float',errors='coerce')
print(df.dtypes)

Output

pcv    float32
wc     float32
rc     float32
dtype: object

This will result in nan values where strings could not be converted. You should examine your dataset to see what other cleaning may be required.

edited Feb 3, 2020 at 14:08

answered Feb 3, 2020 at 13:53

scign

9638 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

nhaus Over a year ago

This works, thank you very much. The problem was, that for some reason in a certain row a ? was placed instead of a nan...

Sam Chats · Accepted Answer · 2020-02-03 13:59:25Z

0

You can try a custom conversion function:

def str_to_float(x):
    return float(x.strip())

data = pd.read_csv(file, usecols=["pcv", "wc", "rc"], 
                   dtype={"pcv": np.float64, "wc": np.float64, "rc": np.float64},
                   converters={"pcv": str_to_float, "wc": str_to_float, "rc": str_to_float})

answered Feb 3, 2020 at 13:59

Sam Chats

2,3211 gold badge14 silver badges36 bronze badges

1 Comment

Sam Chats Over a year ago

@nhaus Try adding a print(x) before the return and re-run the code. What output does it give?

Collectives™ on Stack Overflow

Error when converting string numbers to float in read_csv

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related