Reduce Hash Values
Use Enumerable#reduce
, if you're ok with getting nil
if the hash happens to be empty:
H.values.reduce(:+) # => 3
Hash.new.values.reduce(:+) # => nil
To safely get 0
when the hash is empty, use:
H.values.reduce(0) { |sum,x| sum + x } # or...
H.reduce(0) { |sum,(key,val)| sum + val } # ...if you need to inspect the key
Here's a quick benchmark, for kicks. Note that it appears to be slightly faster to reduce just the values rather than values from the key/value pairs:
user system total real
H.values.reduce(:+) 4.510000 0.080000 4.590000 ( 4.595229)
H.values.reduce(0) {...} 4.660000 0.080000 4.740000 ( 4.739708)
H.reduce(0) {...} 5.160000 0.070000 5.230000 ( 5.241916)
require 'benchmark'
size = 1_000
hash = Hash[* Array.new(size*2) { rand } ]
N=10_000
Benchmark.bm(24) do |x|
x.report('H.values.reduce(:+)') { N.times { hash.dup.values.reduce(:+) } }
x.report('H.values.reduce(0) {...}') { N.times { hash.dup.values.reduce(0) { |sum,x| sum + x } } }
x.report('H.reduce(0) {...}') { N.times { hash.dup.reduce(0) { |sum,(_,v)| sum + v } } }
end
Reduce hash with key, value and index as block parameters
Maybe something like this?:
h.each_with_index.reduce([]) { |memo, ((k,v), i)| puts [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> nil
All you need is scoping: ((k,v), i)
.
Keeping in mind with reduce
, we always have to return the object at the end of block. Which is kind of an extra overhead unless last operation isn't on the memo
object which returns the object itself.Otherwise it won't return the desired result.
Same thing can be achieved with each_with_index
chained with with_object
like so:
h.each_with_index.with_object([]) { |((k,v), i), memo| memo << [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> []
See the array at last line of output? That's our memo
object, which isn't same as reduce
that we used above.
How to reduce hash value's length?
No, hash values cannot be compressed. By design their bits are highly random and have maximum entropy, so there is no redundancy to compress.
If you want to make the hash values easier to read for users you can use different tricks, such as:
Displaying fewer digits. Instead of 32 digits just show 16.
Using a different base. For instance, if you used base 62 using all the uppercase and lowercase letters plus numbers 0-9 as digits then you could show a 128-bit hash using 22 letters+digits versus 32 hex digits:
log62 (2128) ≈ 21.5
Adding whitespace or punctuation. You'll commonly see CD keys printed with dashes like AX7T4-BZ41O-JK3FF-QOZ96. It's easier for users to read this than 20 digits all jammed together.
Reducing an array of hashes into new hash
This will work:
arr.each_with_object({}) do |obj, hash|
%i[all_sales direct_sales referred_sales].each do |sym|
hash[sym] = hash[sym].to_i + obj[sym]
end
end
It's one iteration, you can write the nested loop as 3 different lines, but it's a bit cleaner this way in my opinion.
Note: calling to_i
while getting previous value of hash[sym]
as initially it is nil
and nil.to_i == 0
. Alternatively, you can initialize all unknown counts with 0, like this:
arr.each_with_object(Hash.new(0)) do |obj, hash|
%i[all_sales direct_sales referred_sales].each do |sym|
hash[sym] += obj[sym]
end
end
How to reduce array of hashes with duplicate keys to nested hash?
def combine(arr)
arr.group_by {|g|g[:foo]}.map {|_,a|{foo: a.first[:foo], bar: a.map {|g| g[:bar]}}}
end
combine arr_with_dup_hsh_keys
#=> [{:foo=>"dup", :bar=>[1, 2, 3, 4, 5]}]
arr_with_dup_hsh_keys1 = [
{ foo: "dup", bar: 1 },
{ foo: "dup", bar: 2 },
{ foo: "soup", bar: 3 },
{ foo: "dup", bar: 4 },
{ foo: "soup", bar: 5 }
]
combine arr_with_dup_hsh_keys1
#=> [{:foo=>"dup", :bar=>[1, 2, 4]}, {:foo=>"soup", :bar=>[3, 5]}]
See Enumerable#group_by and note that
arr_with_dup_hsh_keys1.group_by { |g| g[:foo] }
#=> {"dup"=> [{:foo=>"dup", :bar=>1}, {:foo=>"dup", :bar=>2},
# {:foo=>"dup", :bar=>4}],
# "soup"=>[{:foo=>"soup", :bar=>3}, {:foo=>"soup", :bar=>5}]}
You could alternatively write the following.
def combine(arr)
arr.each_with_object({}) do |g,h|
f = g.merge(bar: [g[:bar]])
h.update(f[:foo]=>f) { |_,o,n| { foo: o[:foo], bar: o[:bar]+n[:bar] } }
end.values
end
combine arr_with_dup_hsh_keys1
#=> [{:foo=>"dup", :bar=>[1, 2, 4]}, {:foo=>"soup", :bar=>[3, 5]}]
This uses the form of Hash#update (aka merge!
) that employs a block to determine the values of keys that are present in both hashes being merged. See the doc for an explanation of the three block variables (the first being the common key, which I've represented with an underscore to signify that it's not used in the block calculation).
Ruby range.reduce with hash accumulator
Block in reduce
should return new accumulator. In your case
(1..5).reduce({}) { |hash, i| hash["#{i}"] = i }
block returns i
, which is an integer, so on the second iteration you will try to call []
on an integer. What you need it this:
(1..5).reduce({}) { |hash, i| hash["#{i}"] = i; hash }
Hash and reduce to bucket algorithm
Your colleague is simply wrong.
If a hash works well, all hash values should be equally likely, with a relationship that is not obvious from the input data.
When you take the hash mod some value, you are then mapping equally likely hash inputs to a reduced number of output buckets. The result is now not evenly distributed to the extent that outputs can be produced by different numbers of inputs. As long as the number of buckets is small relative to the range of hash values, this discrepancy is small. It is on the order of # of buckets / # of hash values. Since the number of buckets is typically under 10^6 and the number of hash values is more than 10^19, this is very small indeed. But if the number of buckets divides the range of hash values, there is no discrepancy.
Primality doesn't enter into it except from the point that you get the best distribution when the number of buckets divides the range of the hash function. Since the range of the hash function is usually a power of 2, a prime number of buckets is unlikely to do anything for you.
Most efficient way to cross compare millions of hash values in a list
Assuming the subtraction is just a regular subtraction, try sorting first, Sorts can be O(n Ln(n)) time complexity which is a little better than n^2
That way you could iterate once with two pointers finding groups of hashes that are all close to each other. This would be n*k complexity with n being the number of hashes and k being the average number that match.
The pseudo code would look something like
sort(hashes_list) #large to small
count = size(hashes_list)
i = 0
while i < count:
j = i + 1
while hashes_list[i] - hashes_list[j] < threshold:
#do something
j += 1
i += 1
you might be able to skip the check in some cases. For example where 0 - 10 all are within the threshold, then 1-10 would also be and the "#do something" would just have to be called for each without another check
Related Topics
Install Rvm "Bash /Root/.Rvm/Scripts/Rvm No Such File or Directory"
Does Ruby Support Unicode and How Does It Work
List Dynamic Attributes in a Mongoid Model
Devise Nomethoderror 'For' Parametersanitizer
Xpath to Find All Following Siblings Up Until the Next Sibling of a Particular Type
Rails Asset Pipeline Not Including Required Files in Application.Js Manifest
Understanding Rails Instance Variables
Installing Native Ruby Extensions on Windows for Jekyll
Devise Authentication Gem: How to Save the Logged in User Id
Ruby Imap Idle Concurrency - How to Tackle
Start Using Ruby on Rails, Web Services and Oauth
Import SASS Partial Over Http Instead of Filesystem
How to Implement Options Hashes in Ruby
How to Timeout Flash Messages in Rails
Advice on Using Modules with Ruby on Rails
Can Activerecord Connect to Postgresql Remotely and Protect the Db Password