not able to change object to float in pandas dataframe

Question

just started learning python. trying to change a columns data type from object to float to take out the mean. I have tried to change [] to () and even the "". I dont know whether it makes a difference or not. Please help me figure out what the issue is. thanks!!

My code:

df["normalized-losses"]=df["normalized-losses"].astype(float)

error which i see: attached as imageenter image description here

Please format the code and the Traceback. Select the code and type ctrl-k. Formatting posts ... Formatting help — wwii
– wwii, Commented Aug 19, 2018 at 13:31

Rik · Accepted Answer · 2018-08-19 15:19:24Z

3

Use:

df['normalized-losses'] = df['normalized-losses'][~(df['normalized-losses'] == '?' )].astype(float)

Using df.normalized-losses leads to interpreter evaluating df.normalized which doesn't exist. The statement you have written executes (df.normalized) - (losses.astype(float)).There appears to be a question mark in your data which can't be converted to float.The above statement converts to float only those rows which don't contain a question mark and drops the rest.If you don't want to drop the columns you can replace them with 0 using:

df['normalized-losses'] = df['normalized-losses'].replace('?', 0.0)
df['normalized-losses'] = df['normalized-losses'].astype(float)

edited Aug 19, 2018 at 15:19

answered Aug 19, 2018 at 13:33

Rik

4774 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Aarushi Goyal Over a year ago

I am getting this error. what am I doing wrong here?

Aarushi Goyal Over a year ago

ValueError Traceback (most recent call last) <ipython-input-61-4ee2085f959e> in <module>() ----> 1 df["normalized-losses"]=df["normalized-losses"].astype(float) ~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)

Aarushi Goyal Over a year ago

~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 175 else:176 kwargs[new_arg_name] = new_arg_value --> 177 return func(*args, **kwargs) 178 return wrapper 179 return _deprecate_kwarg ~\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs) 4995 # else, only a single dtype is given 4996 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors, -> 4997 **kwargs) 4998 return self._constructor(new_data).__finalize__(self) 4999

Harikrishna Over a year ago

Can you paste this error to the question by including the command you tried from this answer and the error message associated with that

Rik Over a year ago

I have edited my post. Check if it still gives any errors

Josh Friedlander · Accepted Answer · 2018-08-19 14:15:40Z

2

Welcome to Stack Overflow, and good luck on your Python journey! An important part of coding is learning how to interpret error messages. In this case, the traceback is quite helpful - it is telling you that you cannot call normalized after df, since a dataframe does not have a method of this name.

Of course you weren't trying to call something called normalized, but rather the normalized-losses column. The way to do this is as you already did once - df["normalized-losses"].

As to your main problem - if even one of your values can't be converted to a float, the columnwide operation will fail. This is very common. You need to first eliminate all of the non-numerical items in the column, one way to find them is with df[~df['normalized_losses'].str.isnumeric()].

edited Aug 19, 2018 at 14:15

answered Aug 19, 2018 at 13:35

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

3 Comments

Josh Friedlander Over a year ago

and more on that topic here: stackoverflow.com/questions/21771133/…

Aarushi Goyal Over a year ago

so in normalized-losses there are some entries which are missing and are represented by a ?. I need to fill those values by the average of normalized-losses. how can I fill in with averages if I cant find the average because of the missing values

Jon Clements Over a year ago

@Aarushi you may wish to use pd.to_numeric(df['column_name'], errors='coerce') and then either .dropna() to get rid of invalid values or use .fillna(0) to introduce a value that'll count towards the number of items in a group but not it's sum...

Harikrishna · Accepted Answer · 2018-08-19 13:47:57Z

1

The "df.normalized-losses" does not signify anything to python in this case. you can replace it with df["normalized-losses"]. Usually, if you try

df["normalized-losses"]=df["normalized-losses"].astype(float)

This should work. What this does is, it takes normalized-losses column from dataframe, converts it to float, and reassigns it to normalized column in the same dataframe. But sometimes it might need some data processing before you try the above statement.

answered Aug 19, 2018 at 13:47

Harikrishna

1,1405 gold badges13 silver badges32 bronze badges

Comments

Daniel Roseman · Accepted Answer · 2018-08-19 13:31:28Z

0

You can't use - in an attribute or variable name. Perhaps you mean normalized_losses?

answered Aug 19, 2018 at 13:31

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

2 Comments

Aarushi Goyal Over a year ago

if I use - in the header and keep it under "" then should it pose a problem?

user3483203 Over a year ago

@AarushiGoyal no, you'll have no issues if you access it using key lookup. You simply won't be able to use dot notation.

Collectives™ on Stack Overflow

not able to change object to float in pandas dataframe

4 Answers 4

5 Comments

3 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related