1

While reading in a csv file (kidney_disease.csv from https://www.kaggle.com/mansoordaku/ckdisease/data), pandas mistakenly assigns the columns pcv, wc and rc the dtype object (should be float). Specifying the dtypes leads to the an error:

data = pd.read_csv(file, usecols=["pcv", "wc", "rc"], 
                   dtype={"pcv": np.float64, "wc": np.float64, "rc": np.float64})

ValueError: could not convert string to float: '\t?'

Can anyone explain to me why this happens? All values in these columns are either strings which correspond to numbers or nan. And is there a possibilty for pandas to "guess" the dtype based on the first 100 rows or something like this?

Thanks alot!

2 Answers 2

2

The source data file is not clean. You should read in the file first and then parse to float.

import pandas as pd

df = pd.read_csv('kidney_disease.csv')
cols = ['pcv','wc','rc']
df = df[cols]
for col in cols:
    df[col] = pd.to_numeric(df[col],downcast='float',errors='coerce')
print(df.dtypes)

Output

pcv    float32
wc     float32
rc     float32
dtype: object

This will result in nan values where strings could not be converted. You should examine your dataset to see what other cleaning may be required.

Sign up to request clarification or add additional context in comments.

1 Comment

This works, thank you very much. The problem was, that for some reason in a certain row a ? was placed instead of a nan...
0

You can try a custom conversion function:

def str_to_float(x):
    return float(x.strip())

data = pd.read_csv(file, usecols=["pcv", "wc", "rc"], 
                   dtype={"pcv": np.float64, "wc": np.float64, "rc": np.float64},
                   converters={"pcv": str_to_float, "wc": str_to_float, "rc": str_to_float})

1 Comment

@nhaus Try adding a print(x) before the return and re-run the code. What output does it give?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.