How to Efficiently Extract Repeated Elements in a Ruby Array

How can I efficiently extract repeated elements in a Ruby array?

Inspired by Ilya Haykinson's answer:

def repeated(array)
counts = Hash.new(0)
array.each{|val|counts[val]+=1}
counts.reject{|val,count|count==1}.keys
end

How to find and return a duplicate value in array

a = ["A", "B", "C", "B", "A"]
a.detect{ |e| a.count(e) > 1 }

I know this isn't very elegant answer, but I love it. It's beautiful one liner code. And works perfectly fine unless you need to process huge data set.

Looking for faster solution? Here you go!

def find_one_using_hash_map(array)
map = {}
dup = nil
array.each do |v|
map[v] = (map[v] || 0 ) + 1

if map[v] > 1
dup = v
break
end
end

return dup
end

It's linear, O(n), but now needs to manage multiple lines-of-code, needs test cases, etc.

If you need an even faster solution, maybe try C instead.

And here is the gist comparing different solutions: https://gist.github.com/naveed-ahmad/8f0b926ffccf5fbd206a1cc58ce9743e

how to get repeated elements from ruby array?

arr = [1,2,3,1,5,2]
arr.group_by {|e| e}.map { |e| e[0] if e[1][1]}.compact

Pretty ugly... but does the job without an n+1 problem.

Remove duplicate elements from array in Ruby

array = array.uniq

uniq removes all duplicate elements and retains all unique elements in the array.

This is one of many beauties of the Ruby language.

How do I detect duplicate values within an array in Ruby?

You can create a hash to store number of times any element is repeated. Thus iterating over array just once.

h = Hash.new(0)
['a','b','b','c'].each{ |e| h[e] += 1 }

Should result

 {"a"=>1, "b"=>2, "c"=>1}

Fast way to find duplicate in large array

Your code is taking an eon to execute because it is executing count for each element, resulting in it having a computational complexity of O(n2).

arr = [*1..35000, 1, 34999]

If you want to know which values appear in the array at least twice...

require 'set'

uniq_set = Set.new
arr.each_with_object(Set.new) { |x,dup_set| uniq_set.add?(x) || dup_set.add(x) }.to_a
#=> [1, 34999]

Set lookups (implemented with a hash under the covers) are extremely fast.

See Set#add? and Set#add.

If you want to know the numbers of times values appear in the array that appear at least twice...

arr.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }.select { |_,v| v > 1 }
#=> {1=>2, 34999=>2}

This uses a counting hash1. See Hash::new when it takes a default value as an argument.

If you want to know the indices of values that appear in the array at least twice...

arr.each_with_index.
with_object({}) { |(x,i),h| (h[x] ||= []) << i }.
select { |_,v| v.size > 1 }
#=> {1=>[0, 35000], 34999=>[34998, 35001]}

When the hash h does not already have a key x,

(h[x] ||= []) << i
#=> (h[x] = h[x] || []) << i
#=> (h[x] = nil || []) << i
#=> (h[x] = []) << i
#=> [] << i where [] is now h[x]

1. Ruby v2.7 gave us the method Enumerable#tally, allowing us to write arr.tally.select { |_,v| v > 1 }.

how to get the indexes of duplicating elements in a ruby array

 duplicates = arr.each_with_index.group_by(&:first).inject({}) do |result, (val, group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end

This will return a hash where the keys will be the duplicate elements and the values will be an array containing the index of each occurrence.
For your test input, the result is:

{"A"=>[0, 6], "X"=>[1, 2]}

If all your care about is the indices you can do duplicates.values.flatten to get an array with just the indices.
In this case: [0, 6, 1, 2]

How to count duplicates in Ruby Arrays

This will yield the duplicate elements as a hash with the number of occurences for each duplicate item. Let the code speak:

#!/usr/bin/env ruby

class Array
# monkey-patched version
def dup_hash
inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end
end

# unmonkeey'd
def dup_hash(ary)
ary.inject(Hash.new(0)) { |h,e| h[e] += 1; h }.select {
|_k,v| v > 1 }.inject({}) { |r, e| r[e.first] = e.last; r }
end

p dup_hash([1, 2, "a", "a", 4, "a", 2, 1])
# {"a"=>3, 1=>2, 2=>2}

p [1, 2, "Thanks", "You're welcome", "Thanks",
"You're welcome", "Thanks", "You're welcome"].dup_hash
# {"You're welcome"=>3, "Thanks"=>3}

Find a Duplicate in an array Ruby

Array#difference comes to the rescue yet again. (I confess that @user123's answer is more straightforward, unless you pretend that Array#difference is already a built-in method. Array#difference is probably the more efficient of the two, as it avoids the repeated invocations of count.) See my answer here for a description of the method and links to its use.
In a nutshell, it differs from Array#- as illustrated in the following example:

a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]

a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]

One day I'd like to see it as a built-in.

For the present problem, if:

arr = [1,2,3,4,3,4]

the duplicate elements are given by:

arr.difference(arr.uniq).uniq
#=> [3, 4]


Related Topics



Leave a reply



Submit