9

I've been trying to work with getting a single column out of a csv file.

I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html but still don't really understand how to use it.

If I use CSV.table, the response is incredibly slow compared to CSV.read. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.

My request is simply currently looks like this

@dataTable = CSV.table('path_to_csv.csv')

and when I debug I get a response of

#<CSV::Table mode:col_or_row row_count:2104 >

The documentation says I should be able to use by_col(), but when I try to output

<%= debug @dataTable.by_col('col_name or index') %>

It gives me "undefined method 'col' error"

Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?

I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.

1
  • You might be getting it wrong because by_col doesn't take any arguments. Commented May 11, 2011 at 19:32

3 Answers 3

16

To pluck a column out of a csv I'd probably do something like the following:

col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}

That should be substantially faster than any operations on CSV.Table

Sign up to request clarification or add additional context in comments.

Comments

16

You can get the values from single column of the csv files using the following snippet.

@dataTable = CSV.table('path_to_csv.csv')
@dataTable[:columnname]

1 Comment

this is the best answer
1

I found that this works for me (I'm using the OP's variable name here):

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!
p @dataTable.values_at('Field1')

This prints all the values in the column Field1, as an array of arrays with one element: [value1],[value2],[value3]... and so on. So

p @dataTable.values_at('Field1').flatten

will print all the values in the column Field1 in a single array.

If you want to loop through all the fields in a table one by one, then here's one way to do that. First, you have to convert so that indexes reference columns rather than rows, with by_col!. Then indexes will reference columns instead of rows, and you can do something like this:

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!

0.upto(@dataTable.headers.size - 1) do |i|
  p @dataTable.values_at(i).flatten.compact.size # Or whatever you want here
end

This is a way to work up summary values from a CSV file, which can then be used to create a pivot table. If there's a requirement to input data from a CSV file and output summary data in the form of a pivot table, this might be a straightforward way to go.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.