Data type conversion issue in Pandas Dataframe

Question

I have a csv file that looks like this from a !cat

,City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp
0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,2010-01-15,2,2,3,2,3,3
1,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Suburban_Urban,2010-07-15,2,2,3,2,2,2
2,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Suburban_Urban,2011-01-15,2,2,2,2,2,2

When I read it in via a read_csv I get a dataframe all of the ..._exp fields are single digit numbers that I need to do basic math with (It was working great when I was using read-table with another variant of the file)

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

But when I go to do any math I get a type error indicating the column is string eg:

df['Credit_exp'] = df['Credit_exp']/2
TypeError: unsupported operand type(s) for /: 'str' and 'int'

I don't see how to convert or get it as a int? I tried specifying field types like ,dtype={'Credit_exp': np.int32, ... in the file read options,, it did not like that and I tried to do a type conversion like df['Credit_exp'] = int(df['Credit_exp']) Which just gave me:

TypeError: only length-1 arrays can be converted to Python scalars

So there is something obvious I'm missing...

Which version of pandas do you use? Your sample is working for me with a 0.12. Otherwise df['Credit_exp'].apply(int) could do the trick. NB: your division will be euclidean — Zeugma
– Zeugma, Commented Oct 23, 2013 at 22:17
On 12+ dev of Pandas.. so I'll try apply(int) I still don't get why the Dtype={ does not work on read_csv? — dartdog
– dartdog, Commented Oct 23, 2013 at 22:23
It seems your raw data is not clean. "Credit_exp" column may contain some string values. Try data['Credit_exp'].astype('int') and see what error message you got. — Yeqing Zhang
– Yeqing Zhang, Commented Oct 23, 2013 at 22:37
df['Credit_exp'] = df['Credit_exp'].apply(int) gives me ValueError: invalid literal for int() with base 10: '\\N' — dartdog
– dartdog, Commented Oct 23, 2013 at 22:40
@dartdog: that means that one of your Credit_exp values isn't a single digit, it's what looks like a corrupted endline marker. — DSM
– DSM, Commented Oct 23, 2013 at 22:41

Zulan · Accepted Answer · 2016-06-07 08:09:19Z

3

Try the following:

df.Credit_exp.astype('int')

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html

edited Jun 7, 2016 at 8:09

Zulan

22.8k7 gold badges57 silver badges117 bronze badges

answered Oct 23, 2013 at 22:29

user1827356

7,0322 gold badges26 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dartdog Over a year ago

df['Credit_exp'] = df.Credit_exp.astype('int') Gives me ::: ValueError: invalid literal for long() with base 10: '\\N'

user1827356 Over a year ago

It's because the column has data that is not an integer as the exception indicates. It probably is cause of the file formatting

Collectives™ on Stack Overflow

Data type conversion issue in Pandas Dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related