How to Group Numbers into Different Buckets in Ruby

How to group numbers into different buckets in Ruby

Simple one-liner, given an array items:

items.inject(Hash.new(0)) {|hash, item| hash[item] += 1; hash}

How it works:

Hash.new(0) creates a new Hash where accessing undefined keys returns 0.

inject(foo) iterates through an array with the given block. For the first iteration, it passes foo, and on further iterations, it passes the return value of the last iteration.

Another way to write it would be:

hash = Hash.new(0)
items.each {|item| hash[item] += 1}

Ruby Array to Histogram: How to group numbers by range?

values = [1, 7, 2, 8, 2]
values.group_by { |x| x / 3 }.map { |k, vs| [(3*k..3*k+2), vs] }.to_h
#=> {0..2=>[1, 2, 2], 6..8=>[7, 8]}

If you really need the empty ranges, I don't think a clean one-liner is possible. But this should do:

grouped = values.group_by { |x| x / 3 }
min, max = grouped.keys.minmax
(min..max).map { |n| [(3*n..3*n+2), grouped.fetch(n, [])] }.to_h
#=> {0..2=>[1, 2, 2], 3..5=>[], 6..8=>[7, 8]}

Ruby group by range, array of numbers?

Not sure what you're trying to do but maybe this helps?

values = [0.0, 0.1, 0.2, 0.3]
values.group_by { |v| (v * 10).ceil - 1 }

That returns a hash:

{-1=>[0.0], 0=>[0.1], 1=>[0.2], 2=>[0.3]}

What is the best Ruby algorithm to divide an array into groups with a minimum and maximum length?

I would approach it like this:

# count of original items
count = 61

# max bucket size
max = 20

# decide buckets
groups = (count / max) + (count % max > 0 ? 1 : 0)

# this will be the final result
result = []

# create buckets
groups.times { result.push(0) }

# iterate over original items and distribute them in the buckets
count.times do |n|
result[n % groups] += 1
end

p result

Given the count as 61, it prints 16, 15, 15, 15. I've explained the purpose of each statement in the snippet itself.

Rails SQL - create buckets, and get counts of records in each bucket

Here is another SQL-based solution.

Obtain bucket for each salary like this:

select 
FLOOR(salary/10000) as bucket
from jobs

Use GROUP BY to do the counting:

select 
bucket,
count(*)
from (
select FLOOR(salary/10000) as bucket
from jobs
) as t1
GROUP BY bucket

Finally, add the ranges instead of bucket number:

select 
CONCAT('$', FORMAT(bucket*10000,0), ' - $', FORMAT((bucket+1)*10000-1,0)) as range,
job_count
from (
select bucket, count(*) as job_count
from (
select FLOOR(salary/10000) as bucket
from jobs
) as t1
GROUP BY bucket
) as t2

Note that the functions used are for MySQL. YMMV.

How to split (chunk) a Ruby array into parts of X elements?

Take a look at Enumerable#each_slice:

foo.each_slice(3).to_a
#=> [["1", "2", "3"], ["4", "5", "6"], ["7", "8", "9"], ["10"]]

Iteration in Ruby

You should check this answer, which gives this example:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
b[v] += 1
end

b.each do |k, v|
puts "#{k} appears #{v} times"
end

Group Hash by values in ruby

You can group hash by its value:

h1 = {
"admin_milestones"=>"1",
"users_milestones"=>"0",
"admin_goals"=>"1",
"users_goals"=>"0",
"admin_tasks"=>"1",
"users_tasks"=>"0",
"admin_messages"=>"1",
"users_messages"=>"0",
"admin_meetings"=>"1",
"users_meetings"=>"0"
}

h2 = h1.group_by{|k,v| v}

It will produce a hash grouped by its values like this:

h2 = {"1"=>[["admin_milestones", "1"], ["admin_goals", "1"], ["admin_tasks", "1"], ["admin_messages", "1"], ["admin_meetings", "1"]], 
"0"=>[["users_milestones", "0"], ["users_goals", "0"], ["users_tasks", "0"], ["users_messages", "0"], ["users_meetings", "0"]]}

Spread objects evenly over multiple collections

I believe this is a variant of the bin packing problem, and as such it is NP-hard. Your answer is essentially a variant of the first fit decreasing heuristic, which is a pretty good heuristic. That said, I believe that the following will give better results.

  • Sort each individual bucket in descending size order, using a balanced binary tree.
  • Calculate average size.
  • Sort the buckets with size less than average (the "too-small buckets") in descending size order, using a balanced binary tree.
  • Sort the buckets with size greater than average (the "too-large buckets") in order of the size of their greatest elements, using a balanced binary tree (so the bucket with {9, 1} would come first and the bucket with {8, 5} would come second).
  • Pass1: Remove the largest element from the bucket with the largest element; if this reduces its size below the average, then replace the removed element and remove the bucket from the balanced binary tree of "too-large buckets"; else place the element in the smallest bucket, and re-index the two modified buckets to reflect the new smallest bucket and the new "too-large bucket" with the largest element. Continue iterating until you've removed all of the "too-large buckets."
  • Pass2: Iterate through the "too-small buckets" from smallest to largest, and select the best-fitting elements from the largest "too-large bucket" without causing it to become a "too-small bucket;" iterate through the remaining "too-large buckets" from largest to smallest, removing the best-fitting elements from them without causing them to become "too-small buckets." Do the same for the remaining "too-small buckets." The results of this variant won't be as good as they are for the more complex variant because it won't shift buckets from the "too-large" to the "too-small" category or vice versa (hence the search space will be smaller), but this also means that it has much simpler halting conditions (simply iterate through all of the "too-small" buckets and then halt), whereas the complex variant might cause an infinite loop if you're not careful.

The idea is that by moving the largest elements in Pass1 you make it easier to more precisely match up the buckets' sizes in Pass2. You use balanced binary trees so that you can quickly re-index the buckets or the trees of buckets after removing or adding an element, but you could use linked lists instead (the balanced binary trees would have better worst-case performance but the linked lists might have better average-case performance). By performing a best-fit instead of a first-fit in Pass2 you're less likely to perform useless moves (e.g. moving a size-10 object from a bucket that's 5 greater than average into a bucket that's 5 less than average - first fit would blindly perform the movie, best-fit would either query the next "too-large bucket" for a better-sized object or else would remove the "too-small bucket" from the bucket tree).



Related Topics



Leave a reply



Submit