How to Build a Ruby Hash Out of Two Equally-Sized Arrays

How to build a Ruby hash out of two equally-sized arrays?

h = Hash[a.zip b] # => {:baz=>1, :bof=>2, :bar=>"world", :foo=>"hello"}

...damn, I love Ruby.

Create a hash using two arrays

irb(main):001:0> a = ["x", "y"]; b = [2, 4]
=> [2, 4]
irb(main):002:0> Hash[a.zip(b)]
=> {"x"=>2, "y"=>4}

Combine two unequal arrays to hash

product might be what you're looking for:

numbers.product(letters).map { |n, l| {number: n, letter: l} }
# => [{:number=>1, :letter=>"q"}, {:number=>1, :letter=>"w"}, {:number=>1, :letter=>"e"}, {:number=>1, :letter=>"r"}, {:number=>2, :letter=>"q"}, {:number=>2, :letter=>"w"}, {:number=>2, :letter=>"e"}, {:number=>2, :letter=>"r"}]

Make hash from two arrays of different length, distribute elements evenly

0) Problem description:

I have two sets and I want to define a mapping between them.

First is a set of unique strings, named SS or "strings", and its item is called "a string".

The first set is finite, it consists of NStrings items.

Each string in the first set may consist of any number of words, denoted by NumWords(string).

Thus, the first set provides also a statistical property of an average word count per string, denoted by TargetAVG.

The second is a set of unique numbers, named KK or "keys", and its item is called "a key".

The second set is finite, it consists of NKeys items.

The exact value of those numbers is irrelevant, they are used just as unique identifiers.

It is guaranteed that the second set has more entries than the first.

I want to generate mapping MM between the first and second set.

Every item of the second set (keys) should be assigned exactly one item of the first set (strings).

This mapping must use each item of the first set (strings) at least once.

Any item of the first set (strings) may be used multiple times.

Thus, the mapping generates also a statistical property of a number of uses of a given item from first set (strings), denoted by NumUses(string).

I want to generate such mapping, that the number of words in the strings that were assigned to the keys
produces the same average of TargetAVG (or as close as possible), with the comment that the string
counts to the average as many times as many times it was used by the mapping.

1) restate:

problem:

selecting a fixed number of differently-valued items from an fixed set of unique items to best fit to target total worth. The count of items to be selected is greater than the number of items, this some items must be selected many times.

extra restriction:

each item must be selected at least once.

where:

items = SS

target item count = NKeys

item value = NumWords(item) * NumUses(item)

target total worth = TargetAVG * NKeys (= estimated total amount of words in the whole mapping)

2) let's try to reduce the problem complexity:

There are more keys than strings + each string MUST be used at least once + each key must be used exactly once.

Thus, a properly generated mapping will contain a subset that will consist of every one of the strings mapped to different keys.

Thus, a NString of the keys are already partially solved, because we know they must be matched one-to-one to each one of the strings,
we just do not know the order. For example we know that some 30 out of 70 keys must be paired 1-to-1 to each one of 30 strings,
but we do not know which key to which string. However, the exact order of assignement was never important, so we can even
map them straightly: first to first, second to second, ... 30th to 30th.

And this is exactly what we do to reduce the problem.

Therefore:

-) we can always reduce, becase there were more keys than strings

-) and on behalf of this, we will always be left with some leftover keys, exactly (NKeys-NStrings)

-) the partial solution that guarantees "each item must be selected at least once"

Sanity check:

The partial solution has used up NStrings of the keys, and we are left with (NKeys-NStrings) keys.

The final solution must achieve an average equal to TargetAVG.

We already used all NStrings of strings once over first NStrings of the keys.

This means that our partial solution is guaranteed to have internally an average of "TargetAVG".

We are left with some keys.

This means that the mapping for the rest of the keys also should have average of "TargetAVG", or as close as possible.

We have fulfilled the requirement, we may now use any of the strings any times, even zero.

Everything sounds great.

3) remaining problem:

problem type:

selecting a fixed number of valued items to best fit to target total worth. Any item may me selected any number of times.

where:

items = SS

target item count = (NKeys-NStrings)

item value = NumWords(item) * NumUses(item)

target total worth = TargetAVG * (NKeys-NStrings) (= estimated total amount of words in the leftover mapping)

The important thing is that we want to have closest sum to the given value "S" by using exact "X" number of picks.

It means that it is not a general knapsack packing problem, but it is kind of its subclass, kind of a
Change-making problem. Lets try if it fits that:

We need to deal an amount of cash with the least use of differently-valued coins.

=>
We need to split a specified amount of words between some strings of different word count with exactly X picks.

Plus that we want to have "best-approximate" result in case the ideal is impossible.

Knapsack problems are classified as NP, and getting an exact or best-possible is in general - either hard or very time-taking. After spending some time on the google, I have not found any ready-to-use algorithms that would solve money problem with exact-N-picks, and probably that class of problem is simply known under some other name, which I cannot recall now. I suggest you very much to search, or ask a question on how to classify such problem. If someone more fluent in algorithmic nomenclature answers, you might even find instantly a working solution.

Other things to consider: how serious is your "best result" need, and, really, how much close does it need to be? How much keys there will be with respect to the number of strings? How much will the word count of the strings vary? Any extra conditions on that may help in dropping the knapsack and using some naiive methods that will happen to be safe with those conditions.

For example, if the number of remaining (NKey-NSstrings) is low, just fire a complete exponential search that will check all possibilities and you will surely get the best result.

Elsewise, if you do not need a very best result and also (NKeys-NStrings) is high and also the word count is relatively evenly-shaped, then you probably could just do a simple greedy assignement and the several items that were wrongly assigned would make the average only a little off (several items divided by the high NKeys-NStrings = low fraction of the average).

In other cases, or if you really need the best match, you probably will need to get into "dynamic programming" or "integer linear programming" that can generate approximate solutions for similar problems.

If I have any thoughts on that, I'll add them and leave a comment, but actually I doubt. Out of my memory, I've written everything, and I could give you more pointers more only if I'd actually stick my nose to the algo-books again, which I sadly have to little time for that now:) Drop me a note if you find by chance the correct classification of the problem!

Create a Hash from two arrays of different sizes and iterate until none of the keys are empty

Here's an elegant one. You can "loop" the short array

longer  = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']

longer.zip(shorter.cycle).to_h # => {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}

How to stitch together two arrays based on a common set of keys in Ruby

Let arr1 and arr2 be your two arrays. Due to the fact that they are the same size and that for each index i, arr1[i][i] and arr2[i][i] are the values of the key i of the hashes arr1[i] and arr2[i], the desired result can be obtained quite easily:

arr2.each_with_index.with_object({}) do |(g,i),h|
(h[g[i]] ||= []) << arr1[i][i]
end
#=> {
# "magento2-base"=>[
# "pmet-add-install-module-timings.patch",
# "pmet-fix-module-loader-algorithm.patch",
# "pmet-stop-catching-sample-data-errrors-during-install.patch"
# ],
# "magento/module-sample-data"=>[
# "pmet-change-sample-data-load-order.patch"
# ],
# ...
# "magento/module-staging"=>[
# "pmet-fix-invalid-module-dependencies.patch",
# "pmet-staging-preview-js-fix.patch"
# ],
# "magento/module-customer"=>[
# "pmet-visitor-segment.patch"
# ]
# }

The fragment

h[g[i]] ||= []

is effectively expanded to

h[g[i]] = h[g[i]] || []  # *

If the hash h has no key [g[i]],

h[g[i]] #=> nil

so * becomes

h[g[i]] = nil || [] #=> []

after which

h[g[i]] << "cat"
#=> ["cat"]

(which works with "dog" as well). The above expression can instead be written:

arr2.each_with_index.with_object(Hash.new {|h,k| h[k]=[]}) do |(g,i),h|
h[g[i]] << arr1[i][i]
end

This uses the form of Hash::new that employs a block (here {|h,k| h[k]=[]}) that is called when the hash is accessed by a value that is not one of its keys.

An alternative method is:

arr2.each_with_index.with_object({}) do |(g,i),h|
h.update(g[i]=>[arr1[i][i]]) { |_,o,n| o+n }
end

This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are in both hashes being merged.

A third way is to use Enumerable#group_by:

arr2.each_with_index.group_by { |h,i| arr2[i][i] }.
transform_values { |a| a.map { |_,i| arr1[i][i] } }

Merge two arrays into a Hash

assuming you have 2 equal length arrays x and y

x = [:key1, :key2, :key3]
y = [:value1, :value2, :value3]
z = {}
x.each_with_index { |key,index| z[key] = y[index] }

puts z

=> {:key1=>:value1, :key2=>:value2, :key3=>:value3}

is that what you are looking for?

then maybe this:

x = [:key1, :key2, :key3]
y = [:value1, :value2, :value3]
z = []
x.each_with_index { |key,index| z << { date: key, minutes: y[index]} }

puts z

{:date=>:key1, :minutes=>:value1}
{:date=>:key2, :minutes=>:value2}
{:date=>:key3, :minutes=>:value3}

How to separate an array of hashes into different arrays if a key is equal?

Given:

tst=[
{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0},
{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}
]

You can use .group_by to get a hash of elements by key. In this case, use the key ["beneficiary_document"] passed to the block and you will get a hash of arrays by that key -- two in this case.

You can do:

tst.group_by { |h| h["beneficiary_document"] }
# {"43991028"=>[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0}, {"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0}], "71730550"=>[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84}, {"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}]}

To see it pretty printed:

require "pp"
PP.pp(tst.group_by {|h| h["beneficiary_document"] },$>,120)
{"43991028"=>
[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0}],
"71730550"=>
[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}]}

You can also achieve the same result with a hash that returns an array as a default procedure, then call .map over tst and push the hash into the array by that key:

h=Hash.new { |h,k| h[k]=[] }
tst.map { |eh| h[eh["beneficiary_document"]].push(eh) }

Or, combine that into a single statement:

tst.each_with_object(Hash.new { |h,k| h[k]=[] }) { |g,h|
h[g["beneficiary_document"]].push(g)}

All three methods create identical hashes. The first, .group_by, is the easiest.



Related Topics



Leave a reply



Submit