Remove Duplicate Elements from Array in Ruby

Remove duplicate elements from array in Ruby


array = array.uniq

uniq removes all duplicate elements and retains all unique elements in the array.

This is one of many beauties of the Ruby language.

Remove duplicate from an array in ruby

To just remove duplicates based on :name, simply try;

output = input.uniq { |x| x[:name] }

Demo here.

Edit: Since you added a sorting requirement in the comments, here's how to select the entry with the highest score for every name if you're using Rails, I see you already got an answer for "standard" Ruby above;

output = input.group_by { |x| x[:name] }
.map {|x,y|y.max_by {|x|x[:score]}}

A little explanation may be in order; the first line groups the entries by name so that each name gets its own array of entries. The second line goes through the groups, name by name, and maps each name group to the entry with the highest score.

Demo here.

Ruby: Remove all instances of a duplicate value inside an array


p arr.group_by(&:itself).reject{|k,v|v.count>1}.keys

Output

[2, 3, 5]

Remove duplicates from array in Ruby and perform an operation on a specific index


arr.each_with_object(Hash.new(0)) { |(*a,n),h| h[a] += n }.map(&:flatten)
#=> [["A", "Red", 15], ["B", "Red", 3], ["B", "Blue", 5], ["C", "Blue", 3],
# ["C", "Black", 1], ["D", nil, 9]]

The first step of the calculation is:

h = arr.each_with_object(Hash.new(0)) { |(*a,n),h| h[a] += n }
#=> {["A", "Red"]=>15, ["B", "Red"]=>3, ["B", "Blue"]=>5,
# ["C", "Blue"]=>3, ["C", "Black"]=>1, ["D", nil]=>9}

This uses the form of Hash::new that takes an argument called the default value. All that means is that when Ruby's parser expands h[a] += 1 to

h[a] = h[a] + n

h[a] on the right returns h's default value, 0, if h does not have a key a. For example, when h is empty,

h[["A", "Red"]] = h[["A", "Red"]] + 7 #=> 0 + 7 =>  7
h[["A", "Red"]] = h[["A", "Red"]] + 8 #=> 7 + 8 => 15

h does not have a key ["A", "Red"] in the first expression, so h[["A", "Red"]] on the right returns the default value, 0, whereas h does have that key in the second expression so the default value does not apply.

h.map(&:flatten) is shorthand for

h.map { |a| a.flatten }

When the block variable a is set equal to first key-value pair of h,

a #=> [["A", "Red"], 15]

So

a.flatten
#=> ["A", "Red", 15]

To understand|(*a,n),h| we need to construct the enumerator

enum = arr.each_with_object(Hash.new(0))
#=> #<Enumerator: [["A", "Red", 7], ["A", "Red", 8], ["B", "Red", 3],
# ["B", "Blue", 2], ["B", "Blue", 3], ["C", "Blue", 3],
# ["C", "Black", 1], ["D", nil, 4], ["D", nil, 5]]
# :each_with_object({})>

We now generate the first value from the enumerator (using Enumerator#next) and assign values to the block variables:

(*a,n),h = enum.next
#=> [["A", "Red", 7], {}]
a #=> ["A", "Red"]
n # => 7
h #=> {}

The way in which the array returned by enum.next is broken up into constituent elements that are assigned to the block variables is called array decomposition. It is a powerful and highly useful techique.

Remove duplicates in Ruby Array

The code for most Ruby methods can be found in the ruby-doc.org API documentation. If you mouse over a method's documentation, a "click to toggle source" button appears. The code is in C, but it's very easy to understand.

if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);

if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = rb_hash_values(hash);
}
else {
hash = ary_make_hash(ary);
uniq = rb_hash_values(hash);
}

If there's one element, return it. Otherwise turn the elements into hash keys, turn the hash back into an array. By a documented quirk of Ruby hashes, "Hashes enumerate their values in the order that the corresponding keys were inserted", this technique preserves the original order of the elements in the Array. In other languages it may not.

Alternatively, use a Set. A set will never have duplicates. Loading set adds the method to_set to all Enumerable objects, which includes Arrays. However, a Set is usually implemented as a Hash so you're doing the same thing. If you want a unique array, and if you don't need the elements to be ordered, you should probably instead make a set and use that. unique = array.to_set

Alternatively, sort the Array and loop through it pushing each element onto a new Array. If the last element on the new Array matches the current element, discard it.

array = [2, 3, 4, 5, 1, 2, 4, 5];
uniq = []

# This copies the whole array and the duplicates, wasting
# memory. And sort is O(nlogn).
array.sort.each { |e|
uniq.push(e) if e != uniq[-1]
}

[1, 2, 3, 4, 5]
puts uniq.inspect

This method is to be avoided because it is slower and takes more memory than the other methods. The sort makes it slower. Sorting is O(nlogn) meaning as the array gets bigger sorting will get slower quicker than the array grows. It also requires you to copy the whole array, with duplicates, unless you want to alter the original data by sorting in place with sort!.

The other methods are O(n) speed and O(n) memory meaning they will scale linearly as the array gets bigger. And they don't have to copy the duplicates which can use substantially less memory.

How to remove duplicate entries from array of arrays based on nested value?

Use uniq method! (ruby >= 1.9.2)

array = [
["John, Doe", "/manager/consumer/123456?status=1", {:data=>{:id=>123456}, :class=>""}],
["Jane, smith", "/manager/consumer/7891011?status=1", {:data=>{:id=>7891011}, :class=>""}],
["William, Smith", "/manager/consumer/12131415?status=1", {:data=>{:id=>1211415}, :class=>""}],
["John, Doe", "/manager/consumer/123456?status=1", {:data=>{:id=>123456}, :class=>""}]
]

array.uniq { |_name, _url, hash| hash[:data][:id] }

In case of duplicate of an id it will remove all but the first entry, so you need to think about a situation when the id is the same but rest of the data is not.

NOTE: if you for some reason are running on ruby before 1.9.2, then uniq will ignore the block. For that reason ActiveSupport had uniq_by method (which was removed in version 4.0.2).

How can I remove duplicates in an array without using `uniq`?

the problem is that the inner loop is an infinite loop:

while true
sorted.delete_if {|i| i = i + count}
count += 1
end #while

you can probably do what you are doing but it's not eliminating duplicates.

one way to do this would be:

numbers = [1, 4, 2, 4, 3, 1, 5]
target = []
numbers.each {|x| target << x unless target.include?(x) }
puts target.inspect

to add it to the array class:

class ::Array
def my_uniq
target = []
self.each {|x| target << x unless target.include?(x) }
target
end
end

now you can do:

numbers = [1, 4, 2, 4, 3, 1, 5]
numbers.my_uniq


Related Topics



Leave a reply



Submit