Find Keep Duplicates in Ruby Hashes

Find keep duplicates in Ruby hashes

I tested this and it will do exactly what you want:

b = a.group_by { |h| h[:name] }.values.select { |a| a.size > 1 }.flatten

However, you might want to look at some of the intermediate objects produced in that calculation and see if those are more useful to you.

Find duplicates in array of hashes on specific keys

In terms of efficiency you might want to try this:

grouped = csv_arr.group_by{|row| [row[:user],row[:section]]}
filtered = grouped.values.select { |a| a.size > 1 }.flatten

The first statement groups the records by the :user and :section keys. the result is:

{[1, 123]=>[{:user=>1, :role=>"staff", :section=>123}, {:user=>1, :role=>"exec", :section=>123}],
 [2, 456]=>[{:user=>2, :role=>"staff", :section=>456}, {:user=>2, :role=>"exec", :section=>456}],
 [3, 123]=>[{:user=>3, :role=>"staff", :section=>123}],
 [3, 789]=>[{:user=>3, :role=>"staff", :section=>789}]}

The second statement only selects the values of the groups with more than one member and then it flattens the result to give you:

[{:user=>1, :role=>"staff", :section=>123},
 {:user=>1, :role=>"exec", :section=>123},
 {:user=>2, :role=>"staff", :section=>456},
 {:user=>2, :role=>"exec", :section=>456}]

This could improve the speed of your operation, but memory wise I can't say what the effect would be with a large input, because it would depend on your machine, resources and the size of file

How to detect duplicate keys in hash and add prefix to the duplicate?

It seems straight forward Have attached code snippet

names = %w[David John Alex Sam Caleb David John Alex Sam]
numbers = %w[1 2 3 4 5 6 7 8 9] 

key_pair = {}
names.each_with_index do |name, index|
  name = "A-#{name}" if key_pair[name]
  key_pair[name] = numbers[index]
end

It generates the expected output:

{"David"=>"1", "John"=>"2", "Alex"=>"3", "Sam"=>"4", "Caleb"=>"5", "A-David"=>"6", "A-John"=>"7", "A-Alex"=>"8", "A-Sam"=>"9"}

Finding duplicate keys across hashes?

You could build another hash to store each key and its hashes:

keys = Hash.new { |hash, key| hash[key] = [] }
a.each_key { |k| keys[k] << :a }
b.each_key { |k| keys[k] << :b }
c.each_key { |k| keys[k] << :c }

More precisely, keys stores an array of symbols. It looks like this after running the above code:

keys
#=> {"A"=>[:a, :b],
#    "B"=>[:a, :b],
#    "C"=>[:a],
#    "D"=>[:b, :c],
#    "E"=>[:b, :c],
#    "F"=>[:c]}

To get your expected output:

keys.each do |key, hashes|
  next if hashes.size < 2
  hashes.each { |hash| puts "#{key} is also in #{hash}" }
end

Prints:

A is also in a
A is also in b
B is also in a
B is also in b
D is also in b
D is also in c
E is also in b
E is also in c

Ruby: array of hashes - how to remove duplicates based on the hash key which is an array

So long as your kappa function produces the same value for u,p as for p,u then you can do this:

@result = @user_array.each_with_object({ }) do |u, h|
  @user_array.each do |p|
    next if (u == p)

    h[[u, p].sort] ||= kappa(u, p, "ipf")
  end
end

That populates the values once and once only. If you want to do it where the last value sticks then change ||= to =.

Ruby Hash with duplicate keys?

This would kinda defeat the purpose of a hash, wouldn't it?

If you want a key to point to multiple elements, make it point to an array:

h = Hash.new { |h,k| h[k] = [] }
h[:foo] << :bar
h #=> {:foo=>[:bar]}
h[:foo] << :baz
h #=> {:foo=>[:bar, :baz]}

ruby convert array to hash preserve duplicate key

I hope you would like this :

ary = [
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
       "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
       "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
       "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
       "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
     ]

array_hash = ary.each_slice(2).with_object(Hash.new { |h,k| h[k] = []}) do |(k,v),hash|
  hash[k] << v 
end

# the main advantage is here you wouldn't loose any data, all are with you. You can
# use it as per your need. I think it is a better approach to deal with your situation.
array_hash
# => {"19d97e408ee3f993745b053e281ac9dc69519e06"=>
#      ["refs/heads/auto", "refs/heads/master"],
#     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>["refs/heads/callout_hooks"],
#     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>["refs/heads/elab"],
#     "d38a9a26ef887c08b306bdab210b39882f58e587"=>["refs/heads/elab_6.1"],
#     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>["refs/heads/regression"]}

Ruby array of hashes max value of duplicates

This should work (assuming your array is called a):

a.group_by{|el| el[:tax_id]}.values.map{|el| el.max_by{|x| x[:created]}}

As pointed out in the comment, this assumes max is computed with a simple string comparison, you might want to convert created to Date or DateTime if you want a date comparison. The following will add a :created_date and use that to compute the max:

a.each{|el| el.merge!( {created_date: Date.strptime(el[:created], '%m/%d/%Y')})}.group_by{|el| el[:tax_id]}.values.map{|el| el.max_by{|x| x[:created_date]}}

This works as follows:

We iterate through the array, adding the :created_date for each hash;
We group by :tax_id, this will something of the form ["tax_id_1" => [ {…}, …], "tax_id_2" => …];
We get the values only, as we do not care about the tax_ids;
For each array with the same tax_id, we keep only the one with the maximum :created_date.

Ruby: Hash - Sorting/Adjusting Duplicate Values and Storing Back with Key?

input = { 1456 => 1,
  1532 => 50,
  1892 => 2,
  1092 => 5,
  1487 => 10,
  5641 => 5,
  1234 => 2,
  1687 => 1 }

values = input.sort_by { |a,b| b }.map { |a,b| a }
# => [1456, 1687, 1892, 1234, 1092, 5641, 1487, 1532]
Hash[*values.flat_map.with_index(1) { |a,i| [a,i] }]
# => {1456=>1, 1687=>2, 1892=>3, 1234=>4, 1092=>5, 5641=>6, 1487=>7, 1532=>8}

All the necessary information is contained in the values array.

Find Keep Duplicates in Ruby Hashes