Ruby Difference in Array Including Duplicates

Ruby difference in array including duplicates

I am so glad you asked. I would like to see such a method added to the class Array in some future version of Ruby, as I have found many uses for it:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

A description of the method and links to some of its applications are given here.

By way of example:

a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]

a - b          #=> [1]
a.difference b #=> [1,2,3,2]

Ruby v2.7 gave us the method Enumerable#tally, allowing us to replace the first line of the method with

h = other.tally

Difference Between Arrays Preserving Duplicate Elements in Ruby

I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:

a = [1,2,2,3,4,5,5,5,5]
b = [5]

a.difference b
  #=> [1,2,2,3,4,5,5,5]

The example I've often given is:

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

a.difference b
  #=> [1, 3, 2, 2]

I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.

As shown at the links, the method could be written as follows:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

Comparing array elements including duplicates

This method iterates once over both arrays.
For each array, it creates a hash with the number of occurences of each element.

It then checks that for every unique element in subset, there are at least as many elements in superset.

class Array
  def count_by
    each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
  end

  def subset_of?(superset)
    superset_counts = superset.count_by
    count_by.all? { |k, count| superset_counts[k] >= count }
  end
end

[1, 2, 3, 3, "abc", "de", "f"].count_by
#=> {1=>1, 2=>1, 3=>2, "abc"=>1, "de"=>1, "f"=>1}

[1, 2, 3, 3].count_by
#=> {1=>1, 2=>1, 3=>2}

[1, 2, 3, 3].subset_of? [1, 2, 3, 3, "abc", "de", "f"]
#=> true
[2, 2, "abc"].subset_of? [1, 2, 3, 3, "abc", "de", "f"]
#=> false

If you don't want to patch the Array class, you could define :

count_by(array) and subset_of?(array1, array2).

Ruby Array Comparison Without Deleting Duplicates

to verify ary2 is included in ary1

(ary2 - ary1).empty?

should equal [1, 1, 2, 3, 5]

ary2.select { |e| ary1.include?(e) }

Compare arrays and remove duplicates, in Ruby?

It's just set difference or subtraction and you can write it as such. Operator overloading can be a bliss :)

a is what it is.

a
[[2, 1], [3, 3], [7, 2], [5, 6]]

b = b - a
[[6, 7], [9, 9], [4, 3]]

c = c - b - a # or c - (a + b)
[[1, 1], [2, 2]]

d = d - c - b - a # or d - (a + b + c)
[[3, 1]]

Ruby - Optimize the comparison of two arrays with duplicates

A similar question was posted a few weeks ago and I got the accepted answer with something like:

def is_subset?(a,b)
  !a.find{|x| a.count(x) > b.count(x)}
end

Benchmark Update

require 'benchmark'

def random_char
  ('a'..'z').to_a.sample
end

A = 8.times.map{random_char}
B = 8.times.map{random_char}

def ary_subset?(a,b) # true iff a is subset of b
  a_counts = a.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  b_counts = b.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  a_counts.all? { |a_key,a_ct| a_ct <= b_counts[a_key] }
end

Benchmark.bm do |x|
  x.report('me')     {100000.times{is_subset?(A,B)}}
  x.report('dbenhur'){100000.times{ary_subset?(A,B)}}
end

       user     system      total        real
me  0.375000   0.000000   0.375000 (  0.384022)
dbenhur  2.558000   0.000000   2.558000 (  2.550146)

Ruby: Remove all instances of a duplicate value inside an array

p arr.group_by(&:itself).reject{|k,v|v.count>1}.keys

Output

[2, 3, 5]

Check if two arrays have the same contents (in any order)

This doesn't require conversion to set:

a.sort == b.sort

Ruby - array intersection (with duplicates)

(array1 & array2).flat_map { |n| [n]*[array1.count(n), array2.count(n)].min }
  #=> [2,2,2,3,4,8]

The steps:

a = array1 & array2 
  #=> [2, 3, 4, 8]

The first element of a (2) is passed to the block and assigned to the block variable:

n = 2

and the block calculation is performed:

[2]*[array1.count(2), array2.count(2)].min
  #=> [2]*[4,3].min
  #=> [2]*3
  #=> [2,2,2]

so 2 is mapped to [2,2,2]. The calculations are similar for the remaining three elements of a. As I am using flat_map, this returns [2,2,2,3,4,8].

Do you have trouble remembering how Enumerable#flat_map differs from Enumerable#map? Suppose I had used map rather than flat_map. Then

a.map { |n| [n]*[array1.count(n), array2.count(n)].min }
  #=> [[2, 2, 2], [3], [4], [8]]

flat_map does nothing more that put a splat in front of each of those arrays:

[*[2, 2, 2], *[3], *[4], *[8]]
  #=> [2, 2, 2, 3, 4, 8]

If the arrays array1 and array2 are large and efficiency is a concern, we could do a bit of O(N) pre-processing:

def cnt(arr)
  arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
end

cnt1 = cnt(array1)
  #=> {2=>4, 3=>2, 4=>1, 5=>1, 6=>1, 7=>1, 8=>1, 9=>1} 
cnt2 = cnt(array2)
  #=> {2=>3, 3=>1, 4=>4, 8=>2, 0=>3} 

(array1 & array2).flat_map { |n| [n]*[cnt1[n], cnt2[n]].min }
  #=> [2,2,2,3,4,8]

Ruby Difference in Array Including Duplicates