1

I am parsing an Excel file using Creek. This is the first row (the header):

{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

and all the other rows are:

[ 
 {"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"}, 
 {"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"}
]

My goal is to have the same array, where all hash keys are replaced with key of mapping using a regex expression in the values of the mapping hash.

For example, in the header, the keys match these REGEX:

mapping = {
    date: /Date|Data|datum|Fecha/,
    portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
    currency: /Currency|Valuta|Währung|Divisa|Devise/
    }

So I need all data rows to be replaced like this:

[ 
  {"date"=>2019-05-16 00:00:00 +0200, "portfolio_name"=>"TEXT", "currency"=>"INR"}, 
  {"date=>2019-05-20 00:00:00 +0200, "portfolio_name"=>"TEXT2", "currency"=>"EUR"}
]

2 Answers 2

4

Detect column names in a separate step. Intermediate mapping will look like {"A"=>:date, "B"=>:portfolio_name, "C"=>:currency}, and then you can transform data array.

This is pretty straightforward:

header_mapping = header.transform_values{|v|
  mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}

rows.map{|row|
  row.transform_keys{|k| header_mapping[k].to_s }
}

Code requires Ruby 2.4+ for native Hash#transform_* or ActiveSupport

Sign up to request clarification or add additional context in comments.

2 Comments

It works! Simple and effective! Thank you very much! Very smart!
It helps more if you supply an explanation why this is the preferred solution and explain how it works. We want to educate, not just provide code.
1

TL:DR;

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

header_row = rows.first

mapped_header_row = header_row.inject({}) do |hash, (k, v)|
  mapped_name = mappings.find do |mapped_name, regex|
    v.match? regex
  end&.first

  # defaults to `v.to_sym` (Header Name), if not in mappings
  # you can also raise an Exception here instead if not in mappings, depending on your expectations
  hash[k] = mapped_name || v.to_sym 
  hash
end

mapped_rows = rows[1..-1].map do |row|
  new_row = {}
  row.each do |k, v|
    new_row[mapped_header_row[k]] = v
  end
  new_row
end

puts mapped_rows
# => [
#      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
#      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
#    ]

Given:

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

Steps:

  1. We first extract the first row, to get the column names.

    header_row = rows.first
    puts header_row
    # => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
    
  2. We need to loop through each of the Hash pairs: (key, value), and we need to find if the "value" corresponds to any of our mappings variable.

    In short for this step, we need to somehow convert (i.e.):

    header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

    into

    mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}

    And so...

    mapped_header_row = header_row.inject({}) do |hash, (k, v)|
      mapped_name = mappings.find do |mapped_name, regex|
        v.match? regex
      end&.first
    
      # defaults to `v.to_sym` (Header Name), if not in mappings
      # you can also raise an Exception here instead if not in mappings, depending on your expectations
      hash[k] = mapped_name || v.to_sym 
      hash
    end
    
    puts mapped_header_row
    # => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
    

    See inject

    See find

  3. Now that we have the mapped_header_row (or the "mapped" labels / names for each column), then we can just simply update all of the "keys" of 2nd row until the last row, with the "mapped" name: the keys being "A", "B", and "C"... to be replaced correspondingly with "date", "portfolio_name", and "currency"

    # row[1..-1] means the 2nd element in the array until the last element
    mapped_rows = rows[1..-1].map do |row|
      new_row = {}
      row.each do |k, v|
        new_row[mapped_header_row[k]] = v
      end
      new_row
    end
    
    puts mapped_rows
    # => [
    #      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
    #      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
    #    ]
    

    See map

3 Comments

Instead of supplying a code-only answer, please add commentary about how it works and explain why it's the correct solution. This helps educate the OP so they can reuse the knowledge in case they run into a similar situation.
@theTinMan I try to balance out readability and simplicity, and I thought I was very explicit enough on how I wrote my code by using self-explanatory variables. But I agree that I failed to describe the whole process flow, so thanks! I'll update my answer! :)
Excellent job! Too many never learn to do that but it really helps the site, and will help your answers gain votes. :-) It's not necessary to say things like "updated" or "edited" though.Instead simply update or add the text where you would have put it in initially, which helps in readability. SO has a revision control system that we can see once we hit a certain number of points. It makes it easy to see what's changed and when. Write like you're creating an article for an encyclopedia, kind of friendly and relaxed with clear explanations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.