Problems handling multiple data types in CSV input with Python

Question

I have a csv dump that I'm trying to import to run analysis on metrics therein, cherry picking certain metrics to look at. Some of the cells are strings and some are numbers. However, I can't get csv.reader to handle the numbers properly. A snippet:

with open('t0.csv', 'rU') as file:
    reader = csv.reader(file, delimiter=",", quotechar='|')
    reader.next() # Burn header row
    for row in reader:
        if row[0] != "": # Burn footer row
            t0_company.extend([unicode(row[3], 'utf-8')])
            t0_revenue.extend([row[9]])
            t0_clicks.extend([row[10]])
            t0_pageviews.extend([row[11]])
            t0_avg_cpc.extend([row[13]])
            t0_monthly_budget.extend([row[16]])

I input another file of the same format for metrics at t1. Then I create two dicts for each metric (one at t0 and the other at t1) with the form metric_dict = {'company': 'metric'} like this:

metric = dict(zip(company, metric))

Running simple math on these metrics is problematic however:

percent_change = float(t1_metric_dict[company]) / float(t0_metric_dict[company]) - 1

Returns errors like:

Traceback (most recent call last):
File "report.py", line 104, in <module>
start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
ValueError: could not convert string to float: "6.18"

It seems to pick the same number to complain about every time.

I'm fairly certain the error happens in the division as everything works normally if I swap in a placeholder string as the third element.

I also tried using quoting=csv.QUOTE_NONNUMERIC, changing the second line in the first snippet to

reader = csv.reader(file, delimiter=",", quotechar='|', quoting=csv.QUOTE_NONNUMERIC)

Which gets me this error:

Traceback (most recent call last):
File "report.py", line 30, in <module>
reader.next()
ValueError: could not convert string to float: "Type"

I've tried making sure the csv doesn't have any weird cell types (everything is text) even though I doubt it matters. I'd appreciate any help on this one.

------ Update ------

One of the columns in my input file contains email addresses. As an experiment, I removed all @s from the input docs which changed the error message I'm getting to:

Traceback (most recent call last):
File "report.py", line 129, in <module>
unicode_row = [str(item).encode('utf8') for item in utf8_row]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128)

The code it's referencing is the csv out section:

writer = csv.writer(open('output.csv', 'wb'), delimiter=",", quotechar='|')
for utf8_row in report:
    unicode_row = [str(item).encode('utf8') for item in utf8_row]
    writer.writerow(unicode_row)

----- Update #2 -----

As requested, here is the full snippet that's causing problems:

for company in companies_in_both:
report.append([company,
      start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
      start_clicks_dict[company], end_clicks_dict[company], float(end_clicks_dict[company]) / float(start_clicks_dict[company]) - 1,
      start_pageviews_dict[company], end_pageviews_dict[company], float(end_pageviews_dict[company]) / float(start_pageviews_dict[company]) - 1,
      start_avg_cpc_dict[company], end_avg_cpc_dict[company], float(end_avg_cpc_dict[company]) / float(start_avg_cpc_dict[company]) - 1,
      start_monthly_budget_dict[company], end_monthly_budget_dict[company], float(end_monthly_budget_dict[company]) / float(start_monthly_budget_dict[company]) - 1])

Unrelated: use list.append instead of list.extend to add a single value to a list. — Lev Levitsky
– Lev Levitsky, Commented May 25, 2012 at 22:10
What's the line that gives you the problem? Probably just a CSV dialect problem... — Colin Dunklau
– Colin Dunklau, Commented May 25, 2012 at 22:17
It's long, I'll paste the full line as an update. I'm using OpenOffice on Windows to create the csvs. — acpigeon
– acpigeon, Commented May 25, 2012 at 22:19
Note that you could probably simplify this down to a dictionary of lists rather than lots of lists, and then use the DictReader() to simplify your code. — Gareth Latty
– Gareth Latty, Commented May 25, 2012 at 22:20
@Lattyware hadn't thought of that, will certainly investigate. Thanks. — acpigeon
– acpigeon, Commented May 25, 2012 at 22:22

John Y · Accepted Answer · 2012-05-25 22:41:54Z

1

Are you sure '|' is the right thing to be using for quotechar? I see you have numerous CSV questions, and all your examples use that, so maybe it really is what you want. But it's astoundingly unusual.

Your ValueError message is telling you that you have double-quotes in your data. That is, instead of doing the equivalent of float("6.18"), Python is trying to do float('"6.18"'), and it's choking.

It may be helpful to give a few complete lines of your actual CSV data, surrounding where the error occurs, as viewed with Notepad or equivalent (provided it's not confidential, legally sensitive, etc.).

answered May 25, 2012 at 22:41

John Y

14.6k2 gold badges51 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

acpigeon Over a year ago

I don't have a good answer for my choice in quotechars other than that it was used in the Python documentation of the CSV module where I stole my original code. My source data does indeed have inconsistent use of quotes, and changing the quote char to reflect that seems to take care of the original problem. Now I know not to blindly use code from strangers on the internet! :O

Collectives™ on Stack Overflow

Problems handling multiple data types in CSV input with Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related