Ruby, Value Bucketing, Beautify Code

Ruby, value bucketing, beautify code

One way is to use case

result = case age
 when 0..12 then 1
 when 13..17 then 2
 when 18..24 then 3
 when 25..29 then 4
 -------- so on
 else 0
end

Another way would be to eliminate the redundant && in the condition.

if age < 0 
  0
elsif age < 13
  1
elsif age < 18
  2
elsif age < 25
  3
elsif age < 30
  4
elsif age < 35
  5
elsif age < 40
  6
elsif age < 50
  7
elsif age < 65
  8
else
  9

Rails SQL - create buckets, and get counts of records in each bucket

Here is another SQL-based solution.

Obtain bucket for each salary like this:

select 
  FLOOR(salary/10000) as bucket
from jobs

Use GROUP BY to do the counting:

select 
  bucket, 
  count(*)
from (
    select FLOOR(salary/10000) as bucket
    from jobs
) as t1
GROUP BY bucket

Finally, add the ranges instead of bucket number:

select 
  CONCAT('$', FORMAT(bucket*10000,0), ' - $', FORMAT((bucket+1)*10000-1,0)) as range, 
  job_count
from ( 
    select bucket, count(*) as job_count
    from (
        select FLOOR(salary/10000) as bucket
        from jobs
    ) as t1
    GROUP BY bucket
) as t2

Note that the functions used are for MySQL. YMMV.

retrieve hash from the file in ruby

Do not store it like this. Store it in a yaml file:

access_key: XXXXXXXXXXXXXXXXXXX 
secret_access_key: XXXXXXXXXXXXXXXXXXX
bucket: XXXXXXXXXXXXXXXXXXX 
name_prefix: my_videos/178/4406/ 
x-amz-security-token: AQoDYXdzEBQa4AK5TxvWJM/xsONxl/9ZDVxJc0s9CY+A/yrbhF73fK8ZWxlEibuMiEGEzzJ+UcfXBKdOu7oJR2X8l9HqhAD5JmZ2+JJuZjVG9hqP1RPkoQysxXBCeGdOVqOSPk0kW/5sPUG4bjiBbP8WGR9ibRkEq3tGfYazC/UuAZIJDUe+R8FSZay2Izx8BZj3XwPWjF3DsSaWcTIbsRQlMlEmQHD6n7BDv022hNfX13Zf4U18lzft8Sv98etslTC3pbmRd6AbM1I6rK6hn6fJKmrcHYHD3OCAcC2JDWzsv270gBzv1wY4Uma3f/3HapMIQ5Xb7TU7hlhdHDYjo76FgPRLUPTnw9bXKuWHjG9LVONJuu1aqymlY9iEwASq7Ugk/8w6IMGsRxSeFlbhI689HThukObsQKCpUk2URQwL21fu7/fExUWA5pU5LPwvDgxo0V4Q7JplNwdnXS62Dt3PEjDmuxfXIM3mjZsF 
expires: 1999196123

And then just load it with:

my_hash = YAML.load File.open('/path/to/yaml/file')

my_hash['access_key']     #=> 'XXXXXXXXXXXXXXXX'

What is the proper way to format is_a? Integer?

Try -

    hackerrank[543121] = 100 # store new key-value pair
    hackerrank.keep_if { |x,y| x.is_a?(Integer) && x.odd? } # keep if key is an integer and not even

Or try, your code by using @seph's suggestion https://stackoverflow.com/a/31018495/2545197 -

 hackerrank.store(543121,100) # store new key-value pair
 hackerrank.keep_if { |x,y| x.is_a?(Integer) } # keep if key is an integer
 hackerrank.delete_if { |x,y| x % 2 == 0 } # delete if key is even

ArgumentError: :Bucket must not be blank

You override the bucket_name here:

def list_bucket_objects(s3_resource, bucket_name:'')
                                                 ^

Remove the second parameter and you should be good to go.

Ruby Weighted Round Robin

Define a method to add items to two variables, according to the weighted rules supplied to the parameters:

def fill_with_ratio(weighting1, weighting2, items)

  b1 = 0
  b2 = 0
  ratio = weighting1.fdiv weighting2
  steps = []

  #step 1 empty buckets
  if b1 + b2 == 0
    if ratio <= 1.0
      b2 += 1
    else
      b1 += 1
    end
  end

  steps << { step: 1, b1: b1, b2: b2, ratio: b1.fdiv(b2).round(2) }

  #steps 2 to items
  (items-1).times.with_index(2) do |_,i|

    r1 = b1.succ.fdiv b2
    r2 = b1.fdiv b2.succ

    if (r1 - ratio).abs <= (r2 - ratio).abs
      b1 += 1
    else
      b2 += 1
    end

    steps << { step: i, b1: b1, b2: b2, ratio: b1.fdiv(b2).round(2) }

  end
  steps
end

The if expressions decide which variables to increment by one in order to achieve the closest match to the defined distribution. The steps array only serves to show the steps after each addition. It can be omitted with no effect.

Key methods: Integer#fdiv, Integer#times, Enumerator#with_index, Integer#succ and Integer#abs.

Example One:

require 'pp' #pp prints everything nicely.

pp fill_with_ratio(60, 40, 100)
#[{:step=>1, :b1=>1, :b2=>0, :ratio=>0.0},
# {:step=>2, :b1=>1, :b2=>1, :ratio=>1.0},
# {:step=>3, :b1=>2, :b2=>1, :ratio=>2.0},
# {:step=>4, :b1=>2, :b2=>2, :ratio=>1.0},
# {:step=>5, :b1=>3, :b2=>2, :ratio=>1.5},
# .
# .
# .
# {:step=>98, :b1=>59, :b2=>39, :ratio=>1.51},
# {:step=>99, :b1=>59, :b2=>40, :ratio=>1.48},
# {:step=>100, :b1=>60, :b2=>40, :ratio=>1.5}]

Example Two:

pp fill_with_ratio(30, 40, 21)
#[{:step=>1, :b1=>0, :b2=>1, :ratio=>Infinity},
# {:step=>2, :b1=>1, :b2=>1, :ratio=>1.0},
# {:step=>3, :b1=>1, :b2=>2, :ratio=>0.5},
# {:step=>4, :b1=>2, :b2=>2, :ratio=>1.0},
# {:step=>5, :b1=>2, :b2=>3, :ratio=>0.67},
# {:step=>6, :b1=>3, :b2=>3, :ratio=>1.0},
# {:step=>7, :b1=>3, :b2=>4, :ratio=>0.75},
# {:step=>8, :b1=>3, :b2=>5, :ratio=>0.6},
# {:step=>9, :b1=>4, :b2=>5, :ratio=>0.8},
# {:step=>10, :b1=>4, :b2=>6, :ratio=>0.67},
# {:step=>11, :b1=>5, :b2=>6, :ratio=>0.83},
# {:step=>12, :b1=>5, :b2=>7, :ratio=>0.71},
# {:step=>13, :b1=>6, :b2=>7, :ratio=>0.86},
# {:step=>14, :b1=>6, :b2=>8, :ratio=>0.75},
# {:step=>15, :b1=>6, :b2=>9, :ratio=>0.67},
# {:step=>16, :b1=>7, :b2=>9, :ratio=>0.78},
# {:step=>17, :b1=>7, :b2=>10, :ratio=>0.7},
# {:step=>18, :b1=>8, :b2=>10, :ratio=>0.8},
# {:step=>19, :b1=>8, :b2=>11, :ratio=>0.73},
# {:step=>20, :b1=>9, :b2=>11, :ratio=>0.82},
# {:step=>21, :b1=>9, :b2=>12, :ratio=>0.75}]

Why is calling a variable twice in a conditional statement necessary for a ranged value?

Computer language is not English. Humans are great at guessing what subject is repeated in a compound sentence, computers are not. You always explicitly state what you are testing against on both sides of a logical operator.

That's because in the expression structure <left> and <right>, both left and right are always independent expressions. Expressions can build on expressions on expressions, but a computer programming language will not just re-use (a part of) the left expression in the right expression.

So yes, you have to explicitly name grade again.

Or you could use a different expression form. You could use a chained comparison expression; Python lets you collapse any expression of the form <foo> <comparison1> <shared> and <shared> <comparison2> <bar> into <foo> <comparison1> <shared> <comparison2> <bar>, and the shared expression will be executed just once.

So if you turned

grade >= 80 and grade <= 89

into

80 <= grade and grade <= 89

you can replace that with

80 <= grade <= 89

However, note that the preceding test already handled the grade > 89 case, you can safely drop the upper bound tests:

def grade_converter(grade):
    if grade >= 90:
        return "A"
    elif grade >= 80:
        return "B"
    elif grade >= 70:
        return "C"
    elif grade >= 65:
        return "D"
    else:
        return "F"

Last but not least, you can use a Computer Science trick. Rather than test each grade band separately, one by one, you could use bisection; this always works when your options are sorted.

Instead of starting at the highest value, start in the middle; that divides the possibilities in 2. From there, you keep halving the possibilities until you have the right grade band. This means you only have to do, at most, Log(N) tests for N possibilities, while starting at the top grade will require up to N tests. For 5 tests that's not much of a difference (1.6 steps on average, vs 5), but when N becomes really large, then you'll quickly notice a difference; if you had 1 million options, you could find the matching option in less than 14 steps, guaranteed.

The Python library includes an optimised implementation in the bisect module:

import bisect

def grade_converter(grade):
    grades = ['F', 'D', 'C', 'B', 'A']
    limits = [65, 70, 80, 90]
    return grades[bisect.bisect(limits, grade)]

Ruby, Value Bucketing, Beautify Code