Delete Duplicates in Array based on matching id. Rails

Question

So I've got a .csv file that I've imported into an array. They're all comma separated so I've gone ahead and made a nice array for em.

Now I'm trying to find records with matching id's so I can remove duplicates and only keep the last encountered. Using ID for instance.

I've imported to array but for some reason I can't get a tool like uniq to display the new unique list even though when I do .length on it, it returns the right amount of rows.

Any help would be greatly appreciated.

CODE

    lines = []
    i = 0

    file = File.open("./properties.csv", "r")

    elements = Array[]
    element2 = Array[]
    output = Array[]

    while (line = file.gets)
        i += 1
      # use split to break array up using commas
        arr = line.split(',')
        elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] })
    end

    file.close

    # Loop through array and sort nicely
     element2 = elements.group_by { |c| c[:id] }.values.select { |elements| elements.size > 1 }


    output = (element2.uniq)
    puts output

    puts element2.length

SAMPLE .CSV FILE

ID,Street address,Town,Valuation date,Value
1,1 Northburn RD,WANAKA,1/1/2015,280000
2,1 Mount Ida PL,WANAKA,1/1/2015,280000
3,1 Mount Linton AVE,WANAKA,1/1/2015,780000
1,1 Northburn RD,WANAKA,1/1/2015,330000
2,1 Mount Ida PL,WANAKA,1/1/2015,330000
3,1 Mount Linton AVE,WANAKA,1/1/2015,830000
1,1 Northburn RD,WANAKA,1/1/2016,340000
2,1 Mount Ida PL,WANAKA,1/1/2016,340000
3,1 Mount Linton AVE,WANAKA,1/1/2016,840000
4,1 Kamahi ST,WANAKA,1/1/2016,215000
5,1 Kapuka LANE,WANAKA,1/1/2016,209000
6,1 Mohua MEWS,WANAKA,1/1/2016,620000
7,1 Kakapo CT,WANAKA,1/1/2016,490000
8,1 Mt Gold PL,WANAKA,1/1/2016,1320000
9,1 Penrith Park DR,WANAKA,1/1/2016,1310000

that's true! But correct me here if I'm wrong but isn't that a distinction without a difference? — Tinus Wagner
– Tinus Wagner, Commented May 12, 2016 at 4:09
The difference is using Array[] is just plain bizarre. Using the simplest expression is generally the best. — tadman
– tadman, Commented May 12, 2016 at 4:12

Michael Gaskill · Accepted Answer · 2016-05-12 05:33:31Z

5

So I've actually swapped my approach to using hashes. which seems to automatically remove duplicates and leave the last encountered record intact? Can anyone shed some light here?

    require 'csv'

    element = {}

    CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
        element[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
    end

    puts element["1"]

    element.each do |key, value|
        puts key 
        puts value
    end

    puts "#{element.length} records returned"

To keep the first matching element, instead of the last, you can do a key existence check before assigning the value. This can be done like so:

CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
  key = row.fields[0]
  if !element.key?(key)
    element[key] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
  end
end

which can also be written much more efficiently like this:

CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
  element[row.fields[0]] ||= Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end

Note that these methods to preserve the first found record for a key will perform much better than the version that preserves the final found record for a key. This is because of work avoidance, primarily in producing the hash value, which is done with slice and zip in this code.

edited May 12, 2016 at 5:33

Michael Gaskill

8,06210 gold badges40 silver badges46 bronze badges

answered May 12, 2016 at 4:17

Tinus Wagner

9271 gold badge7 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Tinus Wagner Over a year ago

Anyone know how I could reverse this so the hash only takes the first duplicate entry instead of the last encountered?

Michael Gaskill Over a year ago

Hash uses a unique key as its index. When you use element[row.fields[0]], what you're doing is overwriting the previous value in the hash for that key. This will give you the uniqueness, as long as you're fine with the last id value being the one that gets retained. The new code is generations better than the original, so kudos on coming to that solution! :D

Tinus Wagner Over a year ago

thanks! What if I wanted to retain the first value entered into the hash and then ignore the following?

Michael Gaskill Over a year ago

I updated the answer with some details that show how to do that, and the benefits of preserving the first match instead of the last one. Great follow-up question!

Tinus Wagner Over a year ago

you are an actual legend. How do I give you "boss" points for comments/edits? Thank you so much.

|

Collectives™ on Stack Overflow

Delete Duplicates in Array based on matching id. Rails

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related