Changing the datatype of specific dataframe columns - Pandas

Question

After reading my csv file using read_csv() in Pandas, I want to convert some of the column dataypes to float64 for further processing, since they are currently represented as object dtype. Upon trying the attribute dtype in read_csv, I get the error. Following is the description:

import pandas as pd
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})

Following is the full trace for the error:

ValueError                                Traceback (most recent call last)
<ipython-input-14-554c18573267> in <module>()
----> 1 file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})
  2 #file1 = pd.to_numeric(file_)
  3 file_.values
  4 file_.dtypes

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
703                     skip_blank_lines=skip_blank_lines)
704 
--> 705         return _read(filepath_or_buffer, kwds)
706 
707     parser_f.__name__ = name

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
449 
450     try:
--> 451         data = parser.read(nrows)
452     finally:
453         parser.close()

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1063                 raise ValueError('skipfooter not supported for iteration')
1064 
-> 1065         ret = self._engine.read(nrows)
1066 
1067         if self.options.get('as_recarray'):

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1826     def read(self, nrows=None):
1827         try:
-> 1828             data = self._reader.read(nrows)
1829         except StopIteration:
1830             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in     pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

ValueError: invalid literal for float(): 11,535,309,570.00

How do I convert the dtype of the columns which have numeric data to float64?

If I only read in the csv, and check the dtype of the columns,

file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv")
file_.dtypes

I get this:

Year                                                     int64
City                                                    object
Return office city center                              float64
Average return logistics                               float64
Inverse return houses                                  float64
DAX                                                     object
MFI Interest Rate Germany                              float64
Inflation Rate                                         float64
GDP (EUR)                                               object
Size of City (km square)                                object
Total Population (Number)                               object
Population under 15 (Number)                            object
Population 15 to under 65 (Number)                      object
Population above 65 (Number)                            object
Total private households (Number)                       object
1 Person households (Number)                            object
2 Person households (Number)                            object
3 Person households (Number)                            object
4 Person households (Number)                            object
5 and more person households (Number)                   object
Total unemployment rate (Rate)                         float64
Total employment (Number)                               object
Available income per inhabitant (Eur)                   object
Total residential building (Number)                     object
Total Apartments (Number)                               object
Total new residential building approvals (Number)       object
Total new residential building completions (Number)     object
Total Migration                                         object
Returns                                                float64
Class                                                  float64
dtype: object

Basically, I want to convert the dtype (to float64) of columns DAX, GDP to 5 or more person households (Nuumber) and Total employment (Number) to Total Migration.

Thanks.

Dave Rosenman · Accepted Answer · 2017-12-12 05:10:12Z

2

So if I want to change the datatype of columns c1 and c3 of the dataframe df to float64, here's what I'd do:

import pandas as pd
import numpy as np
df = pd.DataFrame([["1.2","dan","3"],["1.9","joe","5"]], columns = ["c1","c2","c3"])
print(df)
#    c1   c2 c3
# 0  1.2  dan  3
# 1  1.9  joe  5

print(df.dtypes)

#c1    object
#c2    object
#c3    object
dtype: object

df[['c1','c3']] = df[['c1','c3']].astype(np.float64)
print(df)
#    c1   c2   c3
# 0  1.2  dan  3.0
# 1  1.9  joe  5.0

print(df.dtypes)

# c1    float64
# c2     object
# c3    float64
# dtype: object

answered Dec 12, 2017 at 5:10

Dave Rosenman

1,46711 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rahul Bohare Over a year ago

I did file_[["DAX"]] = file_[["DAX"]].astype(np.float64) and I get the exact same error as in the original question, namely; ValueError: invalid literal for float(): 11,535,309,570.00. This is very weird.

Dave Rosenman Over a year ago

So you basically are trying to convert strings with commas into a float? Or maybe there's a problem with your csv file? Could you post the raw data file? That way I might be able to figure out why you still are having problems.

Tanu · Accepted Answer · 2017-12-13 04:36:54Z

1

Believe you missed that you don't need to put quotes around specifying data-types while reading data from CSV, like this

import pandas as pd
import numpy as np
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":np.float64})

edited Dec 13, 2017 at 4:36

answered Dec 12, 2017 at 5:24

Tanu

1,57312 silver badges21 bronze badges

2 Comments

Rahul Bohare Over a year ago

If I specify it like this, I get a NameError: NameError: name 'float64' is not defined

Tanu Over a year ago

Updated the answer, you need to use numpy for conversion to float64

Collectives™ on Stack Overflow

Changing the datatype of specific dataframe columns - Pandas

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related