ruby remove same value multiple keys in hash

Question

from following hash

hash ={"a"=>100,"b"=>200,"c"=>100,"d"=>120,"e" => 400, "f"=>430, "g"=>500}

I want to remove all the pairs ("key", "value") having either same "value" or have diff of 50(of "value"). example, a=>100 and c => 100, should be removed as they have same "value". And d=>120 should also be removed along with as the difference between 100 and 120 is 20. 400 and 430 should also be removed as the difference is 30.

I should have only

hash["b"=>200,"g"=>500]

Above just an example, in reality, I have hash of 33,000 keys.

and "e"=>180 would mean the resulting hash should be empty? — Uri Agassi
– Uri Agassi, Commented Apr 21, 2014 at 20:29
A difference of 50? From what value? Which key that has the same value should stay? So many questions... — squiguy
– squiguy, Commented Apr 21, 2014 at 20:29
This question really mutated from when I first looked at. This is totally not what I first read. — Sqeaky
– Sqeaky, Commented Apr 24, 2014 at 20:58

Cary Swoveland · Accepted Answer · 2014-04-22 05:06:17Z

A pair of a hash's key/value pairs, k1=>v1 and k2=>v2, are both to be deleted if (v1-v2).abs <= 50. This includes pairs for which v1 == v2, so we need not consider the latter separately. I would do this by first constructing an array of keys to keep, then create a hash comprised of the corresponding key/value pairs from the original hash.

Code

keys_to_keep = hash.keys -
  hash.sort_by { |_,v| v }
      .each_cons(2)
      .each_with_object([]) {
        |((k1,v1),(k2,v2)),a| a << k1 << k2 if (v1-v2).abs <= 50 }

keys_to_keep.zip(hash.values_at(*keys_to_keep)).to_h

Explanation

hash = {"a"=>100,"b"=>200,"c"=>100,"d"=>120}

Sort by hash values:

b = hash.sort_by { |_,v| v }
  #=> [["a", 100], ["c", 100], ["d", 120], ["b", 200]]

Next, use Enumerable#each_cons to construct an array of all adjacent pairs of elements of b:

c = b.each_cons(2)
  #=> #<Enumerator:
  # [["a", 100], ["c", 100], ["d", 120], ["b", 200]]:each_cons(2)>

To view the contents of this enumerator:

c.to_a
  #=> [[["a", 100], ["c", 100]],
  #    [["c", 100], ["d", 120]],
  #    [["d", 120], ["b", 200]]]

Now build an array consisting of keys to be deleted (duplicates OK)

d = c.each_with_object([]) {
  |((k1,v1),(k2,v2)),a| a << k1 << k2 if (v1-v2).abs <= 50 }
  #=> ["a", "c", "c", "d"]

To compute d, consider the first value passed to the block by the enumerator c:

k1 => "a"
v1 => 100
k2 => "c"
v2 => 100

Since

(100 - 100).abs <= 50

keys k1 and k2 are added to the array of keys to be deleted (block variable a). The next value passed to the block is:

k1 => "c"
v1 => 100
k2 => "d"
v2 => 120

Since

(100 - 120).abs <= 50

the keys "c" and "d" are also added to a. The third value does not add any keys to a since

(120 - 200).abs > 50

Now construct an array of keys to keep by using set difference:

e = hash.keys
  #=> ["a", "b", "c", "d"]

keys_to_keep = e - d
  #=> ["b"]

Pull out the values for the keys to keep, using Hash#values_at:

f = hash.values_at(*keys_to_keep)
  #=> [200]

Construct an array of key/value pairs for keys to keep:

g = keys_to_keep.zip(f)
  #=> [["b", 200]]

Convert to a hash.

g.to_h # Ruby v.2.0+
  #=> {"b"=>200}

or

Hash[g]
  #=> {"b"=>200}

Uri Agassi · Accepted Answer · 2014-04-22 04:39:10Z

1

Try this:

multiple_values = hash.group_by { |k, v| v }.select { |v, i| i.length > 1 }.map { |v, i| v }

hash.delete_if { |k, v| multiple_values.any? { |i| v < i + 50 && v > i - 50 } }

The first line builds a histogram for all the values (groups entries by value), and filters out all the values which have only one entry.
This gives us a list of all the values which have more than one key associated with them.
The second pass removes all keys whose values are close to one of these by less than 50.

edited Apr 22, 2014 at 4:39

answered Apr 21, 2014 at 20:36

Uri Agassi

37.5k16 gold badges82 silver badges96 bronze badges

3 Comments

Sqeaky Over a year ago

Could you explain this? I thinks it is building a histogram, then putting the duplicates from that into multiple_values, but I am not certain if I am right.

Uri Agassi Over a year ago

@Sqeaky - you are right. I've tried to clarify the explanation.

user1631306 Over a year ago

@UriAgassi I have edited my question. Can you please help me in fixing it. Current solution looks for duplicate values and then uses +/-50 window. I need to look that +/-50 in any case.

Collectives™ on Stack Overflow

ruby remove same value multiple keys in hash

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related