Reduce Hash Values

Reduce Hash Values

Use Enumerable#reduce, if you're ok with getting nil if the hash happens to be empty:

H.values.reduce(:+) # => 3
Hash.new.values.reduce(:+) # => nil

To safely get 0 when the hash is empty, use:

H.values.reduce(0) { |sum,x| sum + x } # or...
H.reduce(0) { |sum,(key,val)| sum + val } # ...if you need to inspect the key

Here's a quick benchmark, for kicks. Note that it appears to be slightly faster to reduce just the values rather than values from the key/value pairs:

                               user     system      total        real
H.values.reduce(:+) 4.510000 0.080000 4.590000 ( 4.595229)
H.values.reduce(0) {...} 4.660000 0.080000 4.740000 ( 4.739708)
H.reduce(0) {...} 5.160000 0.070000 5.230000 ( 5.241916)
require 'benchmark'

size = 1_000
hash = Hash[* Array.new(size*2) { rand } ]

N=10_000
Benchmark.bm(24) do |x|
x.report('H.values.reduce(:+)') { N.times { hash.dup.values.reduce(:+) } }
x.report('H.values.reduce(0) {...}') { N.times { hash.dup.values.reduce(0) { |sum,x| sum + x } } }
x.report('H.reduce(0) {...}') { N.times { hash.dup.reduce(0) { |sum,(_,v)| sum + v } } }
end

Reduce hash with key, value and index as block parameters

Maybe something like this?:

h.each_with_index.reduce([]) { |memo, ((k,v), i)| puts [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> nil

All you need is scoping: ((k,v), i).

Keeping in mind with reduce, we always have to return the object at the end of block. Which is kind of an extra overhead unless last operation isn't on the memo object which returns the object itself.Otherwise it won't return the desired result.

Same thing can be achieved with each_with_index chained with with_object like so:

h.each_with_index.with_object([]) { |((k,v), i), memo| memo << [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> []

See the array at last line of output? That's our memo object, which isn't same as reduce that we used above.

How to reduce hash value's length?

No, hash values cannot be compressed. By design their bits are highly random and have maximum entropy, so there is no redundancy to compress.

If you want to make the hash values easier to read for users you can use different tricks, such as:

  • Displaying fewer digits. Instead of 32 digits just show 16.

  • Using a different base. For instance, if you used base 62 using all the uppercase and lowercase letters plus numbers 0-9 as digits then you could show a 128-bit hash using 22 letters+digits versus 32 hex digits:

    log62 (2128) ≈ 21.5

  • Adding whitespace or punctuation. You'll commonly see CD keys printed with dashes like AX7T4-BZ41O-JK3FF-QOZ96. It's easier for users to read this than 20 digits all jammed together.

Reducing an array of hashes into new hash

This will work:

arr.each_with_object({}) do |obj, hash|
%i[all_sales direct_sales referred_sales].each do |sym|
hash[sym] = hash[sym].to_i + obj[sym]
end
end

It's one iteration, you can write the nested loop as 3 different lines, but it's a bit cleaner this way in my opinion.

Note: calling to_i while getting previous value of hash[sym] as initially it is nil and nil.to_i == 0. Alternatively, you can initialize all unknown counts with 0, like this:

arr.each_with_object(Hash.new(0)) do |obj, hash|
%i[all_sales direct_sales referred_sales].each do |sym|
hash[sym] += obj[sym]
end
end

How to reduce array of hashes with duplicate keys to nested hash?

def combine(arr)
arr.group_by {|g|g[:foo]}.map {|_,a|{foo: a.first[:foo], bar: a.map {|g| g[:bar]}}}
end

combine arr_with_dup_hsh_keys
#=> [{:foo=>"dup", :bar=>[1, 2, 3, 4, 5]}]

arr_with_dup_hsh_keys1 = [
{ foo: "dup", bar: 1 },
{ foo: "dup", bar: 2 },
{ foo: "soup", bar: 3 },
{ foo: "dup", bar: 4 },
{ foo: "soup", bar: 5 }
]

combine arr_with_dup_hsh_keys1
#=> [{:foo=>"dup", :bar=>[1, 2, 4]}, {:foo=>"soup", :bar=>[3, 5]}]

See Enumerable#group_by and note that

arr_with_dup_hsh_keys1.group_by { |g| g[:foo] }
#=> {"dup"=> [{:foo=>"dup", :bar=>1}, {:foo=>"dup", :bar=>2},
# {:foo=>"dup", :bar=>4}],
# "soup"=>[{:foo=>"soup", :bar=>3}, {:foo=>"soup", :bar=>5}]}

You could alternatively write the following.

def combine(arr)
arr.each_with_object({}) do |g,h|
f = g.merge(bar: [g[:bar]])
h.update(f[:foo]=>f) { |_,o,n| { foo: o[:foo], bar: o[:bar]+n[:bar] } }
end.values
end

combine arr_with_dup_hsh_keys1
#=> [{:foo=>"dup", :bar=>[1, 2, 4]}, {:foo=>"soup", :bar=>[3, 5]}]

This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. See the doc for an explanation of the three block variables (the first being the common key, which I've represented with an underscore to signify that it's not used in the block calculation).

Ruby range.reduce with hash accumulator

Block in reduce should return new accumulator. In your case

(1..5).reduce({}) { |hash, i| hash["#{i}"] = i }

block returns i, which is an integer, so on the second iteration you will try to call [] on an integer. What you need it this:

(1..5).reduce({}) { |hash, i| hash["#{i}"] = i; hash }

Hash and reduce to bucket algorithm

Your colleague is simply wrong.

If a hash works well, all hash values should be equally likely, with a relationship that is not obvious from the input data.

When you take the hash mod some value, you are then mapping equally likely hash inputs to a reduced number of output buckets. The result is now not evenly distributed to the extent that outputs can be produced by different numbers of inputs. As long as the number of buckets is small relative to the range of hash values, this discrepancy is small. It is on the order of # of buckets / # of hash values. Since the number of buckets is typically under 10^6 and the number of hash values is more than 10^19, this is very small indeed. But if the number of buckets divides the range of hash values, there is no discrepancy.

Primality doesn't enter into it except from the point that you get the best distribution when the number of buckets divides the range of the hash function. Since the range of the hash function is usually a power of 2, a prime number of buckets is unlikely to do anything for you.

Most efficient way to cross compare millions of hash values in a list

Assuming the subtraction is just a regular subtraction, try sorting first, Sorts can be O(n Ln(n)) time complexity which is a little better than n^2

That way you could iterate once with two pointers finding groups of hashes that are all close to each other. This would be n*k complexity with n being the number of hashes and k being the average number that match.

The pseudo code would look something like

sort(hashes_list) #large to small
count = size(hashes_list)
i = 0
while i < count:
j = i + 1
while hashes_list[i] - hashes_list[j] < threshold:
#do something
j += 1
i += 1

you might be able to skip the check in some cases. For example where 0 - 10 all are within the threshold, then 1-10 would also be and the "#do something" would just have to be called for each without another check



Related Topics



Leave a reply



Submit