No idea if this is actually faster than your approach, but its runtime should be O(N+M) where N,M is size of a,b. (Assuming hash lookup and insert is ammortized O(1) which isn't strictly true as hash is usually a function of key size; though in the example, all keys are single characters.) Your looping #delete_at of #index approach has substantial extra data motion and looks like it may be worst case O(N^2 * M).
def ary_subset?(a,b) # true iff a is subset of b
a_counts = a.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
b_counts = b.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
a_counts.all? { |a_key,a_ct| a_ct <= b_counts[a_key] }
end
The OP asked for the fastest way, so I whipped up a little micro-benchmark available at this gist.
I tested the OP's original approach (op_del), my version of using reduce counts (ct), and the variant where the count array is reused (ct_acc), and the MultiSet approach (mset), EDIT and added the very concise find of count comparisons (slow_ct) . Ran each variant against the OP's small array input example (s), larger sets of cardinality 10,000 (b), and small set against big set (sb). (Had to reduce iteration count for the big set cases by an order of magnitude to get _slow_ct_ to complete in reasonable time.) Results here:
user system total real
s_op_del 1.850000 0.000000 1.850000 ( 1.853931)
s_ct 2.260000 0.000000 2.260000 ( 2.264028)
s_ct_acc 1.700000 0.000000 1.700000 ( 1.706881)
s_mset 5.460000 0.000000 5.460000 ( 5.484833)
s_slow_ct 1.720000 0.000000 1.720000 ( 1.731367)
b_op_del 0.310000 0.000000 0.310000 ( 0.312804)
b_ct 0.120000 0.000000 0.120000 ( 0.123329)
b_ct_acc 0.100000 0.000000 0.100000 ( 0.101532)
b_mset 0.310000 0.000000 0.310000 ( 0.319697)
b_slow_ct 82.910000 0.000000 82.910000 ( 83.013747)
sb_op_del 0.710000 0.020000 0.730000 ( 0.734022)
sb_ct 0.050000 0.000000 0.050000 ( 0.054416)
sb_ct_acc 0.040000 0.000000 0.040000 ( 0.059032)
sb_mset 0.110000 0.000000 0.110000 ( 0.117027)
sb_slow_ct 0.010000 0.000000 0.010000 ( 0.011287)
The reduce count, reusing the count accumulator is the clear winner. Multiset was disappointingly slow.
If the A array had another "d" element, it wouldn't be a subset.In your question can give examples for strings that can be considered subset and ones that are not.