1

I have a csv file that looks like this from a !cat

,City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp
0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,2010-01-15,2,2,3,2,3,3
1,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Suburban_Urban,2010-07-15,2,2,3,2,2,2
2,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Suburban_Urban,2011-01-15,2,2,2,2,2,2

When I read it in via a read_csv I get a dataframe all of the ..._exp fields are single digit numbers that I need to do basic math with (It was working great when I was using read-table with another variant of the file)

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

But when I go to do any math I get a type error indicating the column is string eg:

df['Credit_exp'] = df['Credit_exp']/2
TypeError: unsupported operand type(s) for /: 'str' and 'int'

I don't see how to convert or get it as a int? I tried specifying field types like ,dtype={'Credit_exp': np.int32, ... in the file read options,, it did not like that and I tried to do a type conversion like df['Credit_exp'] = int(df['Credit_exp']) Which just gave me:

TypeError: only length-1 arrays can be converted to Python scalars

So there is something obvious I'm missing...

8
  • Which version of pandas do you use? Your sample is working for me with a 0.12. Otherwise df['Credit_exp'].apply(int) could do the trick. NB: your division will be euclidean Commented Oct 23, 2013 at 22:17
  • On 12+ dev of Pandas.. so I'll try apply(int) I still don't get why the Dtype={ does not work on read_csv? Commented Oct 23, 2013 at 22:23
  • 1
    It seems your raw data is not clean. "Credit_exp" column may contain some string values. Try data['Credit_exp'].astype('int') and see what error message you got. Commented Oct 23, 2013 at 22:37
  • df['Credit_exp'] = df['Credit_exp'].apply(int) gives me ValueError: invalid literal for int() with base 10: '\\N' Commented Oct 23, 2013 at 22:40
  • @dartdog: that means that one of your Credit_exp values isn't a single digit, it's what looks like a corrupted endline marker. Commented Oct 23, 2013 at 22:41

1 Answer 1

3

Try the following:

df.Credit_exp.astype('int')

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html

Sign up to request clarification or add additional context in comments.

2 Comments

df['Credit_exp'] = df.Credit_exp.astype('int') Gives me ::: ValueError: invalid literal for long() with base 10: '\\N'
It's because the column has data that is not an integer as the exception indicates. It probably is cause of the file formatting

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.