0

from following hash

hash ={"a"=>100,"b"=>200,"c"=>100,"d"=>120,"e" => 400, "f"=>430, "g"=>500}

I want to remove all the pairs ("key", "value") having either same "value" or have diff of 50(of "value"). example, a=>100 and c => 100, should be removed as they have same "value". And d=>120 should also be removed along with as the difference between 100 and 120 is 20. 400 and 430 should also be removed as the difference is 30.

I should have only

hash["b"=>200,"g"=>500]

Above just an example, in reality, I have hash of 33,000 keys.

5
  • 2
    and "e"=>180 would mean the resulting hash should be empty? Commented Apr 21, 2014 at 20:29
  • 1
    A difference of 50? From what value? Which key that has the same value should stay? So many questions... Commented Apr 21, 2014 at 20:29
  • yes, "e"=>180, would mean the hash would be empty Commented Apr 21, 2014 at 20:31
  • @squiguy I edited the question. I hope its clear now Commented Apr 21, 2014 at 20:33
  • This question really mutated from when I first looked at. This is totally not what I first read. Commented Apr 24, 2014 at 20:58

2 Answers 2

2

A pair of a hash's key/value pairs, k1=>v1 and k2=>v2, are both to be deleted if (v1-v2).abs <= 50. This includes pairs for which v1 == v2, so we need not consider the latter separately. I would do this by first constructing an array of keys to keep, then create a hash comprised of the corresponding key/value pairs from the original hash.

Code

keys_to_keep = hash.keys -
  hash.sort_by { |_,v| v }
      .each_cons(2)
      .each_with_object([]) {
        |((k1,v1),(k2,v2)),a| a << k1 << k2 if (v1-v2).abs <= 50 }

keys_to_keep.zip(hash.values_at(*keys_to_keep)).to_h

Explanation

hash = {"a"=>100,"b"=>200,"c"=>100,"d"=>120}

Sort by hash values:

b = hash.sort_by { |_,v| v }
  #=> [["a", 100], ["c", 100], ["d", 120], ["b", 200]]

Next, use Enumerable#each_cons to construct an array of all adjacent pairs of elements of b:

c = b.each_cons(2)
  #=> #<Enumerator:
  # [["a", 100], ["c", 100], ["d", 120], ["b", 200]]:each_cons(2)>

To view the contents of this enumerator:

c.to_a
  #=> [[["a", 100], ["c", 100]],
  #    [["c", 100], ["d", 120]],
  #    [["d", 120], ["b", 200]]]

Now build an array consisting of keys to be deleted (duplicates OK)

d = c.each_with_object([]) {
  |((k1,v1),(k2,v2)),a| a << k1 << k2 if (v1-v2).abs <= 50 }
  #=> ["a", "c", "c", "d"]

To compute d, consider the first value passed to the block by the enumerator c:

k1 => "a"
v1 => 100
k2 => "c"
v2 => 100

Since

(100 - 100).abs <= 50

keys k1 and k2 are added to the array of keys to be deleted (block variable a). The next value passed to the block is:

k1 => "c"
v1 => 100
k2 => "d"
v2 => 120

Since

(100 - 120).abs <= 50

the keys "c" and "d" are also added to a. The third value does not add any keys to a since

(120 - 200).abs > 50

Now construct an array of keys to keep by using set difference:

e = hash.keys
  #=> ["a", "b", "c", "d"]

keys_to_keep = e - d
  #=> ["b"]

Pull out the values for the keys to keep, using Hash#values_at:

f = hash.values_at(*keys_to_keep)
  #=> [200]

Construct an array of key/value pairs for keys to keep:

g = keys_to_keep.zip(f)
  #=> [["b", 200]]

Convert to a hash.

g.to_h # Ruby v.2.0+
  #=> {"b"=>200}

or

Hash[g]
  #=> {"b"=>200}
Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

multiple_values = hash.group_by { |k, v| v }.select { |v, i| i.length > 1 }.map { |v, i| v }

hash.delete_if { |k, v| multiple_values.any? { |i| v < i + 50 && v > i - 50 } }

The first line builds a histogram for all the values (groups entries by value), and filters out all the values which have only one entry.
This gives us a list of all the values which have more than one key associated with them.
The second pass removes all keys whose values are close to one of these by less than 50.

3 Comments

Could you explain this? I thinks it is building a histogram, then putting the duplicates from that into multiple_values, but I am not certain if I am right.
@Sqeaky - you are right. I've tried to clarify the explanation.
@UriAgassi I have edited my question. Can you please help me in fixing it. Current solution looks for duplicate values and then uses +/-50 window. I need to look that +/-50 in any case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.