0

I have a csv dump that I'm trying to import to run analysis on metrics therein, cherry picking certain metrics to look at. Some of the cells are strings and some are numbers. However, I can't get csv.reader to handle the numbers properly. A snippet:

with open('t0.csv', 'rU') as file:
    reader = csv.reader(file, delimiter=",", quotechar='|')
    reader.next() # Burn header row
    for row in reader:
        if row[0] != "": # Burn footer row
            t0_company.extend([unicode(row[3], 'utf-8')])
            t0_revenue.extend([row[9]])
            t0_clicks.extend([row[10]])
            t0_pageviews.extend([row[11]])
            t0_avg_cpc.extend([row[13]])
            t0_monthly_budget.extend([row[16]])

I input another file of the same format for metrics at t1. Then I create two dicts for each metric (one at t0 and the other at t1) with the form metric_dict = {'company': 'metric'} like this:

metric = dict(zip(company, metric))

Running simple math on these metrics is problematic however:

percent_change = float(t1_metric_dict[company]) / float(t0_metric_dict[company]) - 1

Returns errors like:

Traceback (most recent call last):
File "report.py", line 104, in <module>
start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
ValueError: could not convert string to float: "6.18"

It seems to pick the same number to complain about every time.

I'm fairly certain the error happens in the division as everything works normally if I swap in a placeholder string as the third element.

I also tried using quoting=csv.QUOTE_NONNUMERIC, changing the second line in the first snippet to

reader = csv.reader(file, delimiter=",", quotechar='|', quoting=csv.QUOTE_NONNUMERIC)

Which gets me this error:

Traceback (most recent call last):
File "report.py", line 30, in <module>
reader.next()
ValueError: could not convert string to float: "Type"

I've tried making sure the csv doesn't have any weird cell types (everything is text) even though I doubt it matters. I'd appreciate any help on this one.

------ Update ------

One of the columns in my input file contains email addresses. As an experiment, I removed all @s from the input docs which changed the error message I'm getting to:

Traceback (most recent call last):
File "report.py", line 129, in <module>
unicode_row = [str(item).encode('utf8') for item in utf8_row]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128)

The code it's referencing is the csv out section:

writer = csv.writer(open('output.csv', 'wb'), delimiter=",", quotechar='|')
for utf8_row in report:
    unicode_row = [str(item).encode('utf8') for item in utf8_row]
    writer.writerow(unicode_row)

----- Update #2 -----

As requested, here is the full snippet that's causing problems:

for company in companies_in_both:
report.append([company,
      start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
      start_clicks_dict[company], end_clicks_dict[company], float(end_clicks_dict[company]) / float(start_clicks_dict[company]) - 1,
      start_pageviews_dict[company], end_pageviews_dict[company], float(end_pageviews_dict[company]) / float(start_pageviews_dict[company]) - 1,
      start_avg_cpc_dict[company], end_avg_cpc_dict[company], float(end_avg_cpc_dict[company]) / float(start_avg_cpc_dict[company]) - 1,
      start_monthly_budget_dict[company], end_monthly_budget_dict[company], float(end_monthly_budget_dict[company]) / float(start_monthly_budget_dict[company]) - 1])
6
  • 2
    Unrelated: use list.append instead of list.extend to add a single value to a list. Commented May 25, 2012 at 22:10
  • 2
    What's the line that gives you the problem? Probably just a CSV dialect problem... Commented May 25, 2012 at 22:17
  • It's long, I'll paste the full line as an update. I'm using OpenOffice on Windows to create the csvs. Commented May 25, 2012 at 22:19
  • 1
    Note that you could probably simplify this down to a dictionary of lists rather than lots of lists, and then use the DictReader() to simplify your code. Commented May 25, 2012 at 22:20
  • @Lattyware hadn't thought of that, will certainly investigate. Thanks. Commented May 25, 2012 at 22:22

1 Answer 1

1

Are you sure '|' is the right thing to be using for quotechar? I see you have numerous CSV questions, and all your examples use that, so maybe it really is what you want. But it's astoundingly unusual.

Your ValueError message is telling you that you have double-quotes in your data. That is, instead of doing the equivalent of float("6.18"), Python is trying to do float('"6.18"'), and it's choking.

It may be helpful to give a few complete lines of your actual CSV data, surrounding where the error occurs, as viewed with Notepad or equivalent (provided it's not confidential, legally sensitive, etc.).

Sign up to request clarification or add additional context in comments.

1 Comment

I don't have a good answer for my choice in quotechars other than that it was used in the Python documentation of the CSV module where I stole my original code. My source data does indeed have inconsistent use of quotes, and changing the quote char to reflect that seems to take care of the original problem. Now I know not to blindly use code from strangers on the internet! :O

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.