1

Ruby newbie here. I've got a product csv where first col is a unique SKU and second col is a product ID that can be duplicated across multiple products (+ many other cols but these are the pertinent ones). Like:

SKU     | Prod ID
 99     | 10384
100     | 10385
101     | 10385
102     | 10386
103     | 10386
104     | 10387

In the script I'm writing, the first time a product ID is used will become a 'parent', and any subsequent instances of the product ID get treated differently (ie, different sizes).

Currently am reading in the whole CSV rather than doing foreach line as I assumed I'd need all the data available to find the duplicates.

Issue is I'm not sure on the how to be able to identify the first time a product ID is used and then identifying any further instances of it's use.

My first thought was to somehow identify the duplicates (uniq?) and then create a new column and put a 1 if it's the first time it's occurred and 0 if it's occurred previously. After looking at uniq I'm not sure how I then go back to the main list and mark my 1's and 0's.

Can someone please point me in the direction of the classes/methods I need to be looking at?

Thanks, Liam

Edit for John D: This gives me the hashes but in 1:1 format not 1: all instances of prod ID

CSV.foreach(INPUT, :headers => true , :header_converters => :symbol, :col_sep => "|",     :quote_char => "\x00") do |csv_obj|
  items[csv_obj.fields[0]] = [csv_obj.fields[1]]
end

so gives; "230709"=>["88507"], "109064"=>["9019"]

2 Answers 2

2

You're thinking of the Sku as the unique identifier, which it may in fact be. But if you turn that on it's head and think of the ProductID as the unique identifier, then you can build a Hash where the key is the ProductID and the value is an Array of Skus. Then you'll be able to track which Skus are associated with which ProductID.

Of course you'll read this in some other way, but the end result would be similar to:

products = 
{
  10384 => [99],
  10385 => [100, 101],
  10386 => [102, 103],
  10387 => [104]
}

Here's an example of how to construct this Hash:

#!/usr/bin/env ruby
require 'csv'

source = [
  "99|110384",
  "100|10385",
  "101|10385",
  "102|10386",
  "103|10386",
  "104|10387"
].join("\n")

source = CSV.parse(source, :col_sep => "|")

hh = source.inject({}) do |memo, row|
  sku = row[0]
  prod = row[1]

  memo[prod] = [] unless memo.include?(prod) 
  memo[prod] << sku
  memo
end

puts hh
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks John, I'm able to create the hash/array structure but I can't get my head around how to merge the duplicates?
Thanks John, the dbl parameter block on the do was new to me, as was inject and include?
2

.group_by() is relatively new (though it has an older counterpart in Rails), but is awfully convenient and should do most of your heavy lifting.

If you create a class to hold each row and put them in an Array, then you can call the group_by method with a block that just checks each object's Product ID field.

That gives you a Hash, which you can iterate through with .keys.each.

Assuming a whole bunch of things about your program that are hopefully semi-obvious, something like:

transactionHash = transactions.group_by { |x| x.productId }

Then, you can go through your transaction lists per product with:

transactionHash.each do |prodId,transList|
  # transList has all of your transaction objects per product
end

Again, that assumes you're keeping your transactions in a list of objects. The x.productId would be something like x[1] if you store each transaction in an array, for example.

3 Comments

Thanks John C; Just trying to work out how the terminology fits. I've got an array of arrays from reading in the csv, but can't get my head around how I call group by on the detailed row array inside the array holding the csv's data.
I avoided specifics, since I didn't want to assume the wrong structure for your data and mislead you, but I've added what I think is generic enough to use. The key is that group_by just takes an expression that explains what defines the groupings.
Thanks John C; this helped me a lot as well; ie I refactored my existing code to use a class, feed it in then use group by; where previously I'd just been dealing with raw array's stuffed into variables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.