1

Out of the 10 columns there in the original CSV, I have 4 columns which I need to make integers (to process with MATLAB later; the other 6 columns already contain integer values). These 4 columns are: (1) platform (2) push (3) timestamp, and (4) udid.

An example input is: #other_column, Android, Y, 10-05-2015 3:59:59 PM, #other_column, d0155049772de9, #other_columns

The corresponding output should be: #other_column, 2, 1, 1431273612198, #other_column, 17923, #other_columns

So, I wrote the following code:

require 'csv'

CSV.open('C:\Users\hp1\Desktop\Datasets\NewColumns2.csv', "wb") do |csv|
  CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map do |row|

        if row['platform']=='Android'
            row['platform']=2
        elsif row['platform']=='iPhone'
            row['platform']=1
        end


        if row['push']=='Y'
            row['push']=1
        elsif row['push']=='N'
            row['push']=0
        end


        row['timestamp'].to_time.to_i

        row['udid'].to_i

        csv<<row
    end
end

Now, the first 3 columns, weekday, platform and push, are having a small number of unique values for the whole file (i.e., 7, 2 and 2 respectively), which is why I used the above approach. However, the other 2 columns, timestamp and udid, are different - they have several values, a few of them common to some rows in the CSV, but there are thousands of unique values. And hence I thought of converting them to integers in the manner I showed above.

Anyhow, none of the columns are getting converted at all. Plus, there is another problem with the datetime column as it is in a format which Ruby apparently does not recognize as a legitimate time format (a sample looks like this: 10-05-2015 3:59:59 PM). So, what should I do? Thanks.

4
  • 1
    Welcome to Stack Overflow. Example input data is needed. Also show your expected output. Also, can you reduce your code to the bare minimum and still show the problem? Commented Sep 17, 2015 at 16:55
  • Thank you. To express the problem with bare minimum code, there are 2 kinds of values in the push column - Y & N, I want to convert the Y to 1 and N to 0. But this code is not working: if row['push']=='Y' row['push']=1 elsif row['push']=='N' row['push']=0 end And other columns have similar issues too, which I have explained in the question details. I'm quite new to Ruby, so I realize this might be a silly issue I'm asking, and maybe the solution is quite easy. Commented Sep 17, 2015 at 17:33
  • Don't explain in a comment, edit your question and add the details into the question as if you'd put them there originally. Putting details into comments makes it much more difficult to figure out the question. And we need example input and what you expect out. Commented Sep 17, 2015 at 17:52
  • Done. Added sample input and output too. Commented Sep 17, 2015 at 18:12

3 Answers 3

1

Edit - Redo, I missed part of the problem with the udids

Problems

  1. You are using map when you don't need to, CSV#foreach already iterates through all of the rows - remove this
  2. Date - include the ruby standard Time library
  3. Unique ids - it sounds like you want to convert the udid into a shorter unique id since there may be more than one entry per mobile device - use an array to make a collection without repeats and use the index of the device udid in the array as your new shorter unique id

I used this as my input csv:

othercol1,platform,push,timestamp,othercol2,udid,othercol3,othercol4,othercol5,othercol6
11,Android, N, 10-05-2015  3:59:59 PM,22, d0155049772de9,33,44,55,66
11,iPhone, N, 10-05-2015  5:59:59 PM,22, d0155044772de9,33,44,55,66
11,iPhone, Y, 10-06-2015  3:59:59 PM,22, d0155049772de9,33,44,55,66
11,Android, Y, 11-05-2015  3:59:59 PM,22, d0155249772de9,33,44,55,66

Here is my output csv:

11,2,0,1431298799,22,1,33,44,55,66
11,1,0,1431305999,22,2,33,44,55,66
11,1,1,1433977199,22,1,33,44,55,66
11,2,1,1431385199,22,3,33,44,55,66

Here is the script I used:

require 'time' # use ruby standard time library to parse for you
require 'csv'

udids = [] # turn the udid in to a shorter unique id 
CSV.open('new.csv', "wb") do |csv|
  CSV.foreach('old.csv', headers: true) do |row|

        if row['platform']=='Android'
            row['platform']=2
        elsif row['platform']=='iPhone'
            row['platform']=1
        end

        if row['push'].strip =='Y'
            row['push']=1
        elsif row['push'].strip =='N'
            row['push']=0
        end

        row['timestamp'] = Time.parse(row['timestamp']).to_i

        # turn the udid in to a shorter unique id
        unless udids.include?(row['udid'])
          udids << row['udid']
        end

        row['udid'] = udids.index(row['udid']) + 1

        csv << row
    end
end
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for these info! But when I try out your suggestions, I get the following error: in '_parse': no implicit conversion of nil into String (TypeError). This is for the timestamp column; and the other columns are being copied as it is, without converting to integers first. Why is that so?
0

This is a wrong usage of map, this is not the function you need. Map is if you want to apply a function to all values in the array, and return the array. What you are doing is iterate, doing some changes, then pushing the modified row into a new array - you can just iterate, no need for the map function to be there: CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true) instead of CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map

About the date, you can use strptime to transform string into date: DateTime.strptime("10-05-2015 3:59:59 PM", "%d-%m-%Y %l:%M:%S %p"). Here the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html

Comments

0

add :converters => :all to your options, so that the dates and numbers are automatically converted. Then, instead of

row['timestamp'].to_time.to_i

which does the conversion but doesn't put it anywhere (it is not in-place), do this:

row['timestamp'] = row['timestamp'].to_time.to_i

note that this only works with converters, otherwise row['timestamp'] is a string and there is no .to_time method.

3 Comments

Thanks. I tried your suggestion; I get the following error: undefined method 'to_time' for nil:NilClass (NoMethodError). Could you suggest a solution?
@user5346597 Are some of the timestamp columns blank? If so you'd need to use something like row['timestamp'] = row['timestamp'].to_time.to_i unless row['timestamp'].nil?
Ok, that worked! But, the new csv does not contain the modified values. It contains identical values as the old csv, meaning they didn't get converted at all. Why is that, if you could elucidate a bit? And also, each row now contains 2 additional columns - they contain 0 and 1 in the exact same order for all the rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.