8

I have a dataframe in pandas that i'm reading in from a csv.

One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23

My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.

I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?

this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.

EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS

7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5
5
  • could you show some data from your dataframe? Commented Dec 1, 2015 at 6:18
  • I can't reproduce that problem. Reading values in scientific notation seems to work fine. Can you provide a small sample dataset demonstrating the problem? Are you sure there isn't some other value in the data that is causing the error? Commented Dec 1, 2015 at 6:18
  • @BrenBarn, @Anton Protopopov, do you think it's the Infinity causing this? Commented Dec 1, 2015 at 16:54
  • 1
    By "tried to convert the dtype", do you mean you simply typed df['speed'].astype(float)? Because df['speed'] = df['speed'].astype(float) should have worked. Commented Dec 1, 2015 at 16:59
  • inf will work, but not Infinity. There is a bug report asking for support for Infinity, but it's not handled yet. Commented Dec 1, 2015 at 18:27

3 Answers 3

2

It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0 then you could use following:

df1 = df.apply(pd.to_numeric, args=('coerce',))

Then you could drop row with NA values with dropna or fill them with zeros with fillna

Sign up to request clarification or add additional context in comments.

Comments

2

I realised it was the infinity statement causing the issue in my data. Removing this with a find and replace worked.

@Anton Protopopov answer also works as did @DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).

Thanks for the help.

Comments

1

In my case, using pandas.round() worked.

df['column'] = df['column'].round(2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.