Ruby replace hash key using a Regex

Question

I am parsing an Excel file using Creek. This is the first row (the header):

{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

and all the other rows are:

[ 
 {"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"}, 
 {"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"}
]

My goal is to have the same array, where all hash keys are replaced with key of mapping using a regex expression in the values of the mapping hash.

For example, in the header, the keys match these REGEX:

mapping = {
    date: /Date|Data|datum|Fecha/,
    portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
    currency: /Currency|Valuta|Währung|Divisa|Devise/
    }

So I need all data rows to be replaced like this:

[ 
  {"date"=>2019-05-16 00:00:00 +0200, "portfolio_name"=>"TEXT", "currency"=>"INR"}, 
  {"date=>2019-05-20 00:00:00 +0200, "portfolio_name"=>"TEXT2", "currency"=>"EUR"}
]

Vasfed · Accepted Answer · 2019-06-28 18:51:17Z

4

Detect column names in a separate step. Intermediate mapping will look like {"A"=>:date, "B"=>:portfolio_name, "C"=>:currency}, and then you can transform data array.

This is pretty straightforward:

header_mapping = header.transform_values{|v|
  mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}

rows.map{|row|
  row.transform_keys{|k| header_mapping[k].to_s }
}

Code requires Ruby 2.4+ for native Hash#transform_* or ActiveSupport

edited Jun 28, 2019 at 18:51

answered Jun 28, 2019 at 12:53

Vasfed

18.6k10 gold badges56 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sparkle Over a year ago

It works! Simple and effective! Thank you very much! Very smart!

the Tin Man Over a year ago

It helps more if you supply an explanation why this is the preferred solution and explain how it works. We want to educate, not just provide code.

the Tin Man · Accepted Answer · 2019-06-30 22:26:52Z

1

TL:DR;

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

header_row = rows.first

mapped_header_row = header_row.inject({}) do |hash, (k, v)|
  mapped_name = mappings.find do |mapped_name, regex|
    v.match? regex
  end&.first

  # defaults to `v.to_sym` (Header Name), if not in mappings
  # you can also raise an Exception here instead if not in mappings, depending on your expectations
  hash[k] = mapped_name || v.to_sym 
  hash
end

mapped_rows = rows[1..-1].map do |row|
  new_row = {}
  row.each do |k, v|
    new_row[mapped_header_row[k]] = v
  end
  new_row
end

puts mapped_rows
# => [
#      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
#      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
#    ]

Given:

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

Steps:

We first extract the first row, to get the column names.

header_row = rows.first
puts header_row
# => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

We need to loop through each of the Hash pairs: (key, value), and we need to find if the "value" corresponds to any of our mappings variable.

In short for this step, we need to somehow convert (i.e.):

header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

into

mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}

And so...

mapped_header_row = header_row.inject({}) do |hash, (k, v)|
  mapped_name = mappings.find do |mapped_name, regex|
    v.match? regex
  end&.first

  # defaults to `v.to_sym` (Header Name), if not in mappings
  # you can also raise an Exception here instead if not in mappings, depending on your expectations
  hash[k] = mapped_name || v.to_sym 
  hash
end

puts mapped_header_row
# => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}

See inject

See find

Now that we have the mapped_header_row (or the "mapped" labels / names for each column), then we can just simply update all of the "keys" of 2nd row until the last row, with the "mapped" name: the keys being "A", "B", and "C"... to be replaced correspondingly with "date", "portfolio_name", and "currency"

# row[1..-1] means the 2nd element in the array until the last element
mapped_rows = rows[1..-1].map do |row|
  new_row = {}
  row.each do |k, v|
    new_row[mapped_header_row[k]] = v
  end
  new_row
end

puts mapped_rows
# => [
#      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
#      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
#    ]

See map

edited Jun 30, 2019 at 22:26

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Jun 28, 2019 at 13:12

Jay-Ar Polidario

6,60318 silver badges30 bronze badges

3 Comments

the Tin Man Over a year ago

Instead of supplying a code-only answer, please add commentary about how it works and explain why it's the correct solution. This helps educate the OP so they can reuse the knowledge in case they run into a similar situation.

Jay-Ar Polidario Over a year ago

@theTinMan I try to balance out readability and simplicity, and I thought I was very explicit enough on how I wrote my code by using self-explanatory variables. But I agree that I failed to describe the whole process flow, so thanks! I'll update my answer! :)

the Tin Man Over a year ago

Excellent job! Too many never learn to do that but it really helps the site, and will help your answers gain votes. :-) It's not necessary to say things like "updated" or "edited" though.Instead simply update or add the text where you would have put it in initially, which helps in readability. SO has a revision control system that we can see once we hit a certain number of points. It makes it easy to see what's changed and when. Write like you're creating an article for an encyclopedia, kind of friendly and relaxed with clear explanations.

Collectives™ on Stack Overflow

Ruby replace hash key using a Regex

2 Answers 2

2 Comments

TL:DR;

Given:

Steps:

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

TL:DR;

Given:

Steps:

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related