0

So I have a CSV with two columns that contain dollar amounts in string format. head -n 5 file.csv reveals the following:

Title,Distributor Long Name,Wk,Estimated Weekend Gross,Cume,Locs Reported,Avg/Loc,Booking Title #
"=""Zero Dark Thirty""","=""Sony""",4,"24,000,000","29,480,807",2937,"8,172","=""66273"""
"=""Haunted House, A""","=""Open Road""",1,"18,817,000","18,817,000",2160,"8,712","=""71209"""
"=""Gangster Squad""","=""Warner Bros.""",1,"16,710,000","16,710,000",3103,"5,385","=""66556"""
"=""Django Unchained""","=""The Weinstein Company""",3,"11,065,000","125,399,122",3012,"3,674","=""66122"""

This goes on for about 40 rows. You'll notice two of the columns — the "Estimated Weekend Gross" and "Cume" ones — have their values as strings.

So my question is, is there a way to iterate over only these two columns, convert the string values to integers doing something like row.to_s.gsub(',','').to_i and then overwrite those values to their respective rows in the same CSV?

I tried doing something like this, but I'm not getting a properly formatted CSV..

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    csv << row[0].to_s.gsub('=','').gsub(', The','')
    csv << row[3].to_s.gsub(',','').to_i
    csv << row[4].to_s.gsub(',','').to_i
  end
end

I've also played around with :headers => :integer when doing the block, but it won't let me convert the values from strings to integers. So, what am I missing? Should I store these values and then write a new CSV or is there a simpler way?

0

3 Answers 3

3

Aaron, just change the row and write it to your new file like this

require 'csv'

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end

EDIT: if you want to save the headers in modified.csv you can do it like this, but there must be a shorter way without opening the file twice, if someone has a better solution for this ?

headers = CSV.open('original.csv', 'r', :headers => true).read.headers
CSV.open('modified.csv', 'w') do |csv|
  csv << headers
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end
Sign up to request clarification or add additional context in comments.

2 Comments

This worked out quite well. I'll be on the lookout for a solution that doesn't have as many I/O operations, but this works for my purposes. Thanks @peter!
thanks for the clear and documented question, i enjoyed it, could you accept the answer please ?
0

You can get it using this:

sed 's/,\("[^"]*"\)*/|\1/g' file.csv | awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$i;}}}print s;}' -

I got this output:

"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""

I know it is hard to understand, so I will explain it step by step:

  1. First of all create put a separator to each field taking into account the quotes with:

    sed 's/,("[^"]")/|\1/g' file.csv

And you will get a pipe separator "|" between each field:

"=""Zero Dark Thirty"""|""|4|"24,000,000"|"29,480,807"|2937|"8,172"|""
"=""Haunted House| A"""|""|1|"18,817,000"|"18,817,000"|2160|"8,712"|""
"=""Gangster Squad"""|""|1|"16,710,000"|"16,710,000"|3103|"5,385"|""
"=""Django Unchained"""|""|3|"11,065,000"|"125,399,122"|3012|"3,674"|""
  1. Once you get this output using pipe as field separator, you can use awk to apply the described filter to fields 4 and 5 (it should be run after sed command because it takes sed's output as input):

    awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$i;}}}print s;}' -

Removing quotes and commas for each field (as an integer representation), and getting your desired output:

"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""

Comments

0

Can you try this:

CSV.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    modified_row = row.clone
    modified_row[0] = row[0].to_s.gsub('=','').gsub(', The','')
    modified_row[3] = row[3].to_s.gsub(',','').to_i
    modified_row[4] = row[4].to_s.gsub(',','').to_i
    csv << modified_row
  end
end

I changed the file opening for writing to use CSV, and then corrected the appending to append the array of a row line instead of appending individual values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.