Ruby Inject with Initial Being a Hash

Ruby inject with initial being a hash

Your block needs to return the accumulating hash:

['a', 'b'].inject({}) {|m,e| m[e] = e; m }

Instead, it's returning the string 'a' after the first pass, which becomes m in the next pass and you end up calling the string's []= method.

Is #inject on hashes considered good style?

Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.

As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.

Initialize a hash in the inject method to prevent a value being nil


Solutions

You can use each_with_object :

array = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]

empty_hash = Hash.new{ |h, k| h[k] = [] }
hash = array.each_with_object(empty_hash) do |small_hash, hash|
small_hash.each do |k, v|
hash[k] << v
end
end

p hash
#=> {:a=>[1, 4, 7], :b=>[2, 5, 8], :c=>[3, 6, 9]}

A shorter, but more unusual version is here :

hash = array.each_with_object(Hash.new{ [] }) do |small_hash, hash|
small_hash.each {|k, v| hash[k] <<= v }
end

p hash
#=> {:a=>[1, 4, 7], :b=>[2, 5, 8], :c=>[3, 6, 9]}

Both return {:a=>[1], :b=>[2]} for [{a: 1}, {b: 2}], as the OP specified.

<<= ?

hash[k] <<= v is a weird (and probably inefficient) trick. It is equivalent to :

hash[k] = (hash[k] << v)

The assignment is needed because the hash default hasn't been properly initialized, and a new array is being generated for every hash lookup, without being saved as a value :

h = Hash.new{ [] }
p h[:a] << 1
#=> [1]
p h[:a]
#=> []
p h[:a] <<= 1
#=> [1]
p h[:a]
#=> [1]

Ruby: inject issue when turning array into hash

Just because Ruby is dynamically and implicitly typed doesn't mean that you don't have to think about types.

The type of Enumerable#inject without an explicit accumulator (this is usually called reduce) is something like

reduce :: [a] → (a → a → a) → a

or in a more Rubyish notation I just made up

Enumerable[A]#inject {|A, A| A } → A

You will notice that all the types are the same. The element type of the Enumerable, the two argument types of the block, the return type of the block and the return type of the overall method.

The type of Enumerable#inject with an explicit accumulator (this is usually called fold) is something like

fold :: [b] → a → (a → b → a) → a

or

Enumerable[B]#inject(A) {|A, B| A } → A

Here you see that the accumulator can have a different type than the element type of the collection.

These two rules generally get you through all Enumerable#inject-related type problems:

  1. the type of the accumulator and the return type of the block must be the same
  2. when not passing an explicit accumulator, the type of the accumulator is the same as the element type

In this case, it is Rule #1 that bites you. When you do something like

acc[key] = value

in your block, assignments evaluate to the assigned value, not the receiver of the assignment. You'll have to replace this with

acc.tap { acc[key] = value }

See also Why Ruby inject method cannot sum up string lengths without initial value?


BTW: you can use destructuring bind to make your code much more readable:

a.inject({}) {|r, (key, value)| r[key] = value; r }

Can someone explain how inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h works?

A more elaborate way to write it would be:

total = Hash.new(0)
string.each_cons(1).each{|bigram| total[bigram] += 1}

inject allows to inject some start value (Hash.new(0) --> we use the default 0 so we can safely use the += operator), and whatever the block returns is injected in the next iteration. So in this case we have to explicitly return the hash (total) to be able to manipulate it in the next step.

A simple example is adding all values of an array:

 [1,4,5,23,2,66,123].inject(0){|sum, value| sum += value}

We start with 0, the first iteration we execute 0 + 1 and the result of that will then be injected in the next iteration.

Note: in your original code, instead of using while loops and maintaining counters, you could more easily iterate over the arrays as follows:

 alphabet.each do |single_char|
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
alphabet.each do |second_char|
two_chars = single_char + second_char
# do something with two_chars
alphabet.each do |third_char|
three_chars = single_char + second-char + third_char
# do something with three_chars
end
end
end

I am guessing it depends on the size of the file whether iterating over all each_cons (1-2-3) or using file.scan will be more efficient.

Using inject with an array of hashes

If you don't specify an argument to inject, the value for the memo object for the first iteration is the first element of the enumerable, an hash in this case. So you just have to pass 0 as the argument to inject:

array = [{lol: 1}, {lol: 2}]
array.inject(0) { |sum, h| sum + h[:lol] }
# => 3

Create key in hash with inject method in Ruby


key = 'en.countries.new_one'

key.split(".").inject(y) do |h, k|
h.key?(k) ? h[k] : h[k] = "x value"
end

Need a simple explanation of the inject method

You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.

At the end of the process, inject returns the accumulator, which in this case is the sum of all the values in the array, or 10.

Here's another simple example to create a hash from an array of objects, keyed by their string representation:

[1,"a",Object.new,:hi].inject({}) do |hash, item|
hash[item.to_s] = item
hash
end

In this case, we are defaulting our accumulator to an empty hash, then populating it each time the block executes. Notice we must return the hash as the last line of the block, because the result of the block will be stored back in the accumulator.

Complicated ruby inject method

Assuming a is an array,

The function first count the occurrences of the keys.

a = ['a', 'b', 'c', 'b']
a.inject({}) { |a,b|
# a: a result hash, this is initially an empty hash (`{}` passed to inject)
# b: each element of the array.
a[b] = a[b].to_i + 1 # Increase count of the item
a # The return value of this block is used as `a` argument of the block
# in the next iteration.
}
# => {"a"=>1, "b"=>2, "c"=>1}

Then, it filter items that occur multiple times:

...reject{ |a,b|
# a: key of the hash entry, b: value of the hash entry (count)
b == 1 # entry that match this condition (occurred only once) is filtered out.
}.keys
# => ["b"]

So, function names like get_duplicated_items should be used instead of function_name to better describe the purpose.

issue with using inject to convert array to hash


why do I not need to initialize data_hash as an empty hash?

You do, implicitly. The value passed to inject, i.e. {} will become the initial value for hsh which will eventually become the value for data_hash. According to the documentation:

At the end of the iteration, the final value of memo is the return value for the method.

Let's see what happens if we don't pass {}:

If you do not explicitly specify an initial value for memo, then the first element of collection is used as the initial value of memo.

The first element of your collection is the array ['dog', 'Fido']. If you omit {}, then inject would use that array as the initial value for hsh. The subsequent call to hsh[v[0]] = v[1] would fail, because of:

hsh = ['dog', 'Fido']
hsh['cat'] = 'Whiskers'
#=> TypeError: no implicit conversion of String into Integer

why do I have to add hsh in the last line

Again, let's check the documentation:

[...] the result [of the specified block] becomes the new value for memo.

inject expects you to return the new value for hsh at the end of the block.

if not it will result in an error.

That's because an assignment like hsh[v[0]] = v[1] returns the assigned value, e.g. 'Fido'. So if you omit the last line, 'Fido' becomes the new value for hsh:

hsh = 'Fido'
hsh['cat'] = 'Whiskers'
#=> IndexError: string not matched

There's also each_with_object which works similar to inject, but assumes that you want to mutate the same object within the block. It therefore doesn't require you to return it at the end of the block: (note that the argument order is reversed)

data_hash = data_arr.each_with_object({}) do |v, hsh|
hsh[v[0]] = v[1]
end
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}

or using array decomposition:

data_hash = data_arr.each_with_object({}) do |(k, v), hsh|
hsh[k] = v
end
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}

Although to convert your array to a hash you can simply use Array#to_h, which is

[...] interpreting ary as an array of [key, value] pairs

data_arr.to_h
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}


Related Topics



Leave a reply



Submit