Is There an Equivalent to 'Array::Sample' for Hashes

Is there an equivalent to `Array::sample` for hashes?


Hash[original_hash.to_a.sample(n)]

For Ruby 2.1,

original_hash.to_a.sample(n).to_h

How to separate an array of hashes into different arrays if a key is equal?

Given:

tst=[
{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0},
{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}
]

You can use .group_by to get a hash of elements by key. In this case, use the key ["beneficiary_document"] passed to the block and you will get a hash of arrays by that key -- two in this case.

You can do:

tst.group_by { |h| h["beneficiary_document"] }
# {"43991028"=>[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0}, {"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0}], "71730550"=>[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84}, {"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}]}

To see it pretty printed:

require "pp"
PP.pp(tst.group_by {|h| h["beneficiary_document"] },$>,120)
{"43991028"=>
[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>5.0},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"43991028", "calification_by_qualifier"=>0.0}],
"71730550"=>
[{"user_id"=>2, "user_name"=>"Pepo", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.84},
{"user_id"=>3, "user_name"=>"Carlos", "beneficiary_document"=>"71730550", "calification_by_qualifier"=>3.4}]}

You can also achieve the same result with a hash that returns an array as a default procedure, then call .map over tst and push the hash into the array by that key:

h=Hash.new { |h,k| h[k]=[] }
tst.map { |eh| h[eh["beneficiary_document"]].push(eh) }

Or, combine that into a single statement:

tst.each_with_object(Hash.new { |h,k| h[k]=[] }) { |g,h|
h[g["beneficiary_document"]].push(g)}

All three methods create identical hashes. The first, .group_by, is the easiest.

How to reference a hash of array of hashes in order to compare values

The problem: there may be groups of hashrefs in each arrayref with the equal spacer value. In each such group the hashref with the lowest energy value
need be identified, to replace that group.

Most work is done in partition_equal(), which identifies hashref groups with equal spacers

use warnings;
use strict;
use List::Util qw(reduce);
use Data::Dump qq(dd);

# Test data: two groups of equal-spacer hashrefs, in the first array only
my %hash = (
kA => [
{ 'energy' => -4.3, 'spacer' => 'AGGCACC' },
{ 'energy' => -2.3, 'spacer' => 'AGGCACC' },
{ 'energy' => -3.3, 'spacer' => 'CAGT' },
{ 'energy' => -1.5, 'spacer' => 'GTT' },
{ 'energy' => -2.5, 'spacer' => 'GTT' },
],
kB => [
{ 'energy' => -4.4, 'spacer' => 'CAGT' },
{ 'energy' => -4.1, 'spacer' => 'GTT' },
{ 'energy' => -4.1, 'spacer' => 'TTG' },
],
);
#dd \%hash;

for my $key (keys %hash) {
my ($spv, $unique) = partition_equal($hash{$key});
next if not $spv;
# Extract minimum-energy hashref from each group and add to arrayref
# $unique, so that it can eventually overwrite this key's arrayref
foreach my $spacer (keys %$spv) {
my $hr_min = reduce {
$a->{energy} < $b->{energy} ? $a : $b
} @{$spv->{$spacer}};
push @$unique, $hr_min;
}
# new: unique + lowest-energy ones for each equal-spacer group
$hash{$key} = $unique if keys %$spv;
}
dd \%hash;

# Sort array and compare neighbouring elements (hashrefs)
sub partition_equal {
my $ra = shift;
my @sr = sort { $a->{spacer} cmp $b->{spacer} } @$ra;

# %spv: spacer value => [ hashrefs with it ], ...
# @unique: hasrefs with unique spacer values
my (%spv, @unique);

# Process first and last separately, to not have to test for them
($sr[0]{spacer} eq $sr[1]{spacer})
? push @{$spv{$sr[0]{spacer}}}, $sr[0]
: push @unique, $sr[0];
for my $i (1..$#sr-1) {
if ($sr[$i]{spacer} eq $sr[$i-1]{spacer} or
$sr[$i]{spacer} eq $sr[$i+1]{spacer})
{
push @{$spv{$sr[$i]{spacer}}}, $sr[$i]
}
else { push @unique, $sr[$i] }
}
($sr[-1]{spacer} eq $sr[-2]{spacer})
? push @{$spv{$sr[-1]{spacer}}}, $sr[-1]
: push @unique, $sr[-1];

return if not keys %spv;
return \%spv, \@unique;
}

Output


kA => [
{ energy => -3.3, spacer => "CAGT" },
{ energy => -2.5, spacer => "GTT" },
{ energy => -4.3, spacer => "AGGCACC" },
],
kB => [
{ energy => -4.4, spacer => "CAGT" },
{ energy => -4.1, spacer => "GTT" },
{ energy => -4.1, spacer => "TTG" },
],

The order inside arrayrefs is not maintained; the new arrayref has first hashrefs with unique spacer values, then those with lowest-energy value (for each original group with same spacer-values).

The sub sorts input by spacer values, so that it can identify equal ones by simply iterating through the sorted array and comparing only neighbors. This should be reasonably efficient.

Hashing the arrangement of an array

An equivalent hash goes something like this for [ 40, 20, 10, 30 ]

  1. 40 is greater than 3 of the subsequent values
  2. 20 is greater than 1 of the subsequent values
  3. 10 is greater than 0 of the subsequent values
  4. 30 has nothing after it, so ignore it.

That is a pair nested loop of time Order(N^2). (Actually about 4*4/2, where there are 4 items.) Just a few lines of code.

Pack 3,1,0, either the way you did it or with anatolyg's slightly tighter:

3 * 3! +
1 * 2! +
0 * 1!

which equal 20. It needs to be stored in the number of bits needed for 4!, namely 5 bits.

I'm pretty sure this is optimal for space. I have not thought of a faster way to compute it other than O(N^2).

How big is your N? For N=100, you need about 520 bits and 5K operations. 5K operations might take several microseconds (for C++), and probably less than a millisecond (even for an interpreted language).

Hashes of Hashes with array values

I think you actually want

%hash = (
'4' => {
'10' => [ 2, 5 ],
'5' => [ 2 ],
'3' => [ 2, 8 ],
},
'9' => {
'4' => [ 3 ],
},
);

Solution:

my %hash;
while (<>) {
my @F = split;
push @{ $hash{ $F[0] }{ $F[1] } }, $F[2];
}

Thanks to autovivification, that will automatically create the hashes and arrays as needed.

You can always use join ',' afterwards if you really do want strings instead of arrays.

for my $k1 (keys(%hash)) {
for my $k2 (keys(${ $hash{$k1} })) {
$hash{$k1}{$k2} = join(',', @{ $hash{$k1}{$k2} });
}
}

What is the best way to convert an array to a hash in Ruby

NOTE: For a concise and efficient solution, please see Marc-André Lafortune's answer below.

This answer was originally offered as an alternative to approaches using flatten, which were the most highly upvoted at the time of writing. I should have clarified that I didn't intend to present this example as a best practice or an efficient approach. Original answer follows.


Warning! Solutions using flatten will not preserve Array keys or values!

Building on @John Topley's popular answer, let's try:

a3 = [ ['apple', 1], ['banana', 2], [['orange','seedless'], 3] ]
h3 = Hash[*a3.flatten]

This throws an error:

ArgumentError: odd number of arguments for Hash
from (irb):10:in `[]'
from (irb):10

The constructor was expecting an Array of even length (e.g. ['k1','v1,'k2','v2']). What's worse is that a different Array which flattened to an even length would just silently give us a Hash with incorrect values.

If you want to use Array keys or values, you can use map:

h3 = Hash[a3.map {|key, value| [key, value]}]
puts "h3: #{h3.inspect}"

This preserves the Array key:

h3: {["orange", "seedless"]=>3, "apple"=>1, "banana"=>2}

To obtain random pair from Hash

To get a random element from a Hash to return as a Hash you could simply patch Hash to do this like

class Hash
def sample(n)
Hash[to_a.sample(n)]
end
end

Then call like

h = {a: 1, b: 2, c: 3} 
h.sample(1)
#=> {b: 2}
h.sample(2)
#=> {:b=>2, :a=>1}

Note: I used Hash::[] for compatibility purposes in Ruby 2.X you could use to_h instead.

Other than that I think there might be a few more issues with your code and it's return values.

If I were to refactor your code the sample code above would not be needed I would instead go with something like it would be something like

def get_available_key
if(generated_keys.empty?)
{"error" => "404. No keys available"}
else
new_key = @generated_keys.keys.sample(1)
@generated_keys.delete(new_key)
@blocked_keys.merge!({new_key => Time.now})[new_key]
end
end

This way it will always respond with a Hash object for handling purposes and it need not worry about multidimensional arrays at all.

I would also change the initial code to be more like this

def create_new_key 
key = SecureRandom.urlsafe_base64
purged_keys.include?(key) ? create_new_key : key
end
def generate_key
key = create_new_key
#add new key to hashes that maintain records
@generated_keys.merge!({key => Time.now})
@all_keys.merge!(@generated_keys) { |key, v1, v2| v1 }
key
end
def add_to_key_chain(length)
@generated_keys ||= {}
length.times do
create_new_key
end
end

Although I don't know what the purged_keys method looks like.

how to replace values in an array of hashes properly in Perl?

This clearly has to do with storing references on the array, instead of independent data. How that comes about isn't clear since details aren't given, but the following discussion should help.

Consider these two basic examples.

First, place a hash (reference) on an array, first changing a value each time

use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
# use Storable qw(dclone);

my %h = ( a => 1, b => 2 );

my @ary_w_refs;

for my $i (1..3) {
$h{a} = $i;
push @ary_w_refs, \%h; # almost certainly WRONG

# push @ary_w_refs, { %h }; # *copy* data
# push @ary_w_refs, dclone \%h; # may be necessary, or just safer
}

dd $_ for @ary_w_refs;

I use Data::Dump for displaying complex data structures, for its simplicity and default compact output. There are other modules for this purpose, Data::Dumper being in the core (installed).

The above prints


{ a => 3, b => 2 }
{ a => 3, b => 2 }
{ a => 3, b => 2 }

See how that value for key a, that we changed in the hash each time, and so supposedly set for each array element, to a different value (1, 2, 3) -- is the same in the end, and equal to the one we assigned last? (This appears to be the case in the question.)

This is because we assigned a reference to the hash %h to each element, so even though every time through the loop we first change the value in the hash for that key in the end it's just the reference there, at each element, to that same hash.

So when the array is queried after the loop we can only get what is in the hash (at key a it's the last assigned number, 3). The array doesn't have its own data, only a pointer to hash's data. (Thus hash's data can be changed by writing to the array as well, as seen in the example below.)

Most of the time, we want a separate, independent copy. Solution? Copy the data.

Naively, instead of

push @ary_w_refs, \%h;

we can do

push @ary_w_refs, { %h };

Here {} is a constructor for an anonymous hash, so %h inside gets copied. So actual data gets into the array and all is well? In this case, yes, where hash values are plain strings/numbers.

But what when the hash values themselves are references? Then those references get copied, and @ary_w_refs again does not have its own data! We'll have the exact same problem. (Try the above with the hash being ( a => [1..10] ))

If we have a complex data structure, carrying references for values, we need a deep copy. One good way to do that is to use a library, and Storable with its dclone is very good

use Storable qw(dclone);
...

push @ary_w_refs, dclone \%h;

Now array elements have their own data, unrelated (but at the time of copy equal) to %h.

This is a good thing to do with a simple hash/array as well, to be safe from future changes, whereby the hash is changed but we forget about the places where it's copied (or the hash and its copies don't even know about each other).

Another example. Let's populate an array with a hashref, and then copy it to another array

use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);

my %h = ( a => 1, b => 2 );

my @ary_src = \%h;
say "Source array: ", pp \@ary_src;

my @ary_tgt = $ary_src[0];
say "Target array: ", pp \@ary_tgt;

$h{a} = 10;
say "Target array: ", pp(\@ary_tgt), " (after hash change)";

$ary_src[0]{b} = 20;
say "Target array: ", pp(\@ary_tgt), " (after hash change)";

$ary_tgt[0]{a} = 100;
dd \%h;

(For simplicity I use arrays with only one element.)

This prints


Source array: [{ a => 1, b => 2 }]
Target array: [{ a => 1, b => 2 }]
Target array: [{ a => 10, b => 2 }] (after hash change)
Target array: [{ a => 10, b => 20 }] (after hash change)
{ a => 100, b => 20 }

That "target" array, which supposedly was merely copied off of a source array, changes when the distant hash changes! And when its source array changes. Again, it is because a reference to the hash gets copied, first to one array and then to the other.

In order to get independent data copies, again, copy the data, each time. I'd again advise to be on the safe side and use Storable::dclone (or an equivalent library of course), even with simple hashes and arrays.

Finally, note a slightly sinister last case -- writing to that array changes the hash! This (second-copied) array may be far removed from the hash, in a function (in another module) that the hash doesn't even know of. This kind of an error can be a source of really hidden bugs.

Now if you clarify where references get copied, with a more complete (simple) representation of your problem, we can offer a more specific remedy.


An important way of using a reference that is correct, and which is often used, is when the structure taken the reference of is declared as a lexical variable every time through

for my $elem (@data) { 
my %h = ...
...
push @results, \%h; # all good
}

That lexical %h is introduced anew every time so the data for its reference on the array is retained, as the array persists beyond the loop, independently for each element.

It is also more efficient doing it this way since the data in %h isn't copied, like it is with { %h }, but is just "re-purposed," so to say, from the lexical %h that gets destroyed at the end of iteration to the reference in the array.

This of course may not always be suitable, if a structure to be copied naturally lives outside of the loop. Then use a deep copy of it.

The same kind of a mechanism works in a function call

sub some_func {
...
my %h = ...
...
return \%h; # good
}

my $hashref = some_func();

Again, the lexical %h goes out of scope as the function returns and it doesn't exist any more, but the data it carried and a reference to it is preserved, since it is returned and assigned so its refcount is non-zero. (At least returned to the caller, that is; it could've been passed yet elsewhere during the sub's execution so we may still have a mess with multiple actors working with the same reference.) So $hashref has a reference to data that had been created in the sub.

Recall that if a function was passed a reference, when it was called or during its execution (by calling yet other subs which return references), changed and returned it, then again we have data changed in some caller, potentially far removed from this part of program flow.

This is done often of course, with larger pools of data which can't just be copied around all the time, but then one need be careful and organize code (to be as modular as possible, for one) so to minimize chance of errors.

This is a loose use of the word "pointer," for what a reference does, but if one were to refer to C I'd say that it's a bit of a "dressed" C-pointer

In a different context it can be a block

What is Ruby's equivalent of Python's hash()?

Ruby has #hash on most objects, including Array, but these values are not unique and will eventually collide.

For any serious use I'd strongly suggest using something like SHA2-256 or stronger as these are cryptographic hashes designed to minimize collisions.

For example:

require 'digest/sha2'

array = %w[ a b c ]

array.hash
# => 3218529217224510043

Digest::SHA2.hexdigest(array.inspect)
# => "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad"

Where that value is going to be relatively unique. SHA2-256 collisions are really infrequent due to the sheer size of that hash, 256 bits vs. the 64 bit #hash value. That's not 4x stronger, it's 6.2 octodecillion times stronger. That number may as well be a "zillion" given how it has 57 zeroes in it.

Perl, loop through array of hashes and print a specific hash element based on criteria

Let's start by avoiding needless copies of hashes and arrays. We're also going to use better variable names than AoH, href and key.

my $school = decode_json($json);
my $depts = $school->{grade3}{departments};

Now we want the departments that have the value of $dept_name for name. grep is a good tool for filtering.

my $dept_name = 'class1';
my @matching_depts = grep { $_->{name} eq $dept_name } @$depts;

Then, it's just a question of iterating over the matching departments, and printing the desired values.

for my $dept (@matching_depts) {
say $dept->{allowedsubjects};
}

Except not quite. That prints

                 <-- Blank line
general
biology
physics
chemistry

Fix:

for my $dept (@matching_depts) {
say substr($dept->{allowedsubjects}, 1);
}

Alternative fix:

for my $dept (@matching_depts) {
my @subjects = grep { $_ ne "" } split /\n/, $dept->{allowedsubjects};
for my $subject (@subjects) {
say $subject;
}
}


Related Topics



Leave a reply



Submit