Why Does Ruby Hash a Fixnum N to 2N+1

Why does Ruby hash a FixNum n to 2n+1?

Integers are objects, so they have an object_id. But there is an infinite number of integers. Seemingly, no room for other objects. How does Ruby pull this off?

10.times{|i| puts i.object_id}

Output:

1
3
5
7
9
11
13
15
17
19

Integers take all odd object_id's, the rest of the objects go in between, they use the even numbers. The conversion from object_id (and hash) to integer (and vice versa) is very easy: chop the rightmost 1 bit (or add it).

Some `Fixnum` properties

In Ruby, most objects require memory to store their class and instance variables. Once this memory is allocated, Ruby represents each object by this memory location. When the object is assigned to a variable or passed to a function, it is the location of this memory that is passed, not the data at this memory. Singleton methods make use of this. When you define a singleton method, Ruby silently replaces the objects class with a new singleton class. Because each object stores its class, Ruby can easily replace an object's class with a new class that implements the singleton methods (and inherits from the original class).

This is no longer true for objects that are immediate values: true, false, nil, all symbols, and integers that are small enough to fit within a Fixnum. Ruby does not allocate memory for instances of these objects, it does not internally represent the objects as a location in memory. Instead, it infers the instance of the object based on its internal representation. What this means is twofold:

  1. The class of each object is no longer stored in memory at a particular location, and is instead implicitly determined by the type of immediate object. This is why Fixnums cannot have singleton methods.

  2. Immediate objects with the same state (e.g., two Fixnums of integer 2378) are actually the same instance. This is because the instance is determined by this state.

To get a better sense of this, consider the following operations on a Fixnum:

>> x = 3 + 7
=> 10
>> x.object_id == 10.object_id
=> true
>> x.object_id == (15-5).object_id
=> true

Now, consider them using strings:

>> x = "a" + "b"
=> "ab"
>> x.object_id == "ab".object_id
=> false
>> x.object_id == "Xab"[1...3].object_id
=> false
>> x == "ab"
=> true
>> x == "Xab"[1...3]
=> true

The reason the object ids of the Fixnums are equal is that they're immediate objects with the same internal representation. The strings, on the other hand, exist in allocated memory. The object id of each string is the location of its object state in memory.

Some low-level information

To understand this, you have to understand how Ruby (at least 1.8 and 1.9) treat Fixnums internally. In Ruby, all objects are represented in C code by variables of type VALUE. Ruby imposes the following requirements for VALUE:

  1. The type VALUE is is the smallest integer of sufficient size to hold a pointer. This means, in C, that sizeof(VALUE) == sizeof(void*).

  2. Any non-immediate object must be aligned on a 4-byte boundary. This means that any object allocated by Ruby will have address 4*i for some integer i. This also means that all pointers have zero values in their two least significant bits.

The first requirement allows Ruby to store both pointers to objects and immediate values in a variable of type VALUE. The second requirement allows Ruby to detect Fixnum and Symbol objects based on the two least significant bits.

To make this more concrete, consider the internal binary representation of a Ruby object z, which we'll call Rz in a 32-bit architecture:

MSB                                   LSB
3 2 1
1098 7654 3210 9876 5432 1098 7654 32 10
XXXX XXXX XXXX XXXX XXXX XXXX XXXX AB CD

Ruby then interprets Rz, the representation of z, as follows:

  1. If D==1, then z is a Fixnum. The integer value of this Fixnum is stored in the upper 31 bits of the representation, and is recovered by performing an arithmetic right shift to recover the signed integer stored in these bits.

  2. Three special representations are tested (all with D==0)

    • if Rz==0, then z is false
    • if Rz==2, then z is true
    • if Rz==4, then z is nil
  3. If ABCD == 1110, then 'z' is a Symbol. The symbol is converted into a unique ID by right-shifting the eight least-significant bits (i.e., z>>8 in C). On 32-bit architectures, this allows 2^24 different IDs (over 10 million). On 64-bit architectures, this allows 2^48 different IDs.

  4. Otherwise, Rz represents an address in memory for an instance of a Ruby object, and the type of z is determined by the class information at that location.

Avoid automatic conversion from Fixnum to Bignum in Ruby

If you want a number to wrap at a certain level, you may need to manually limit it:

def next_random(n)
(1234567890 * n + 12345) % 0x7FFFFFFF
end

You can pick whatever limit you want, where that example is 32-bit signed.

I don't think you're able to lock numerical calculations to an arbitrary range.

Why does the FixNum#to_s method in in Ruby only accept radixes from 2 to 36?

As others have pointed out, radix < 2 is troublesome to render. and there's no conventional agreement on what characters to use for radixes larger than ['0'..'9'] + ['a'..'z'], which is why the standard method doesn't support radix outside those limits.

If you really want a custom radix representation, you would need to define the alphabet of symbols to use for the digits. Here's a little module that will give you the capability.

module CustomRadix
# generate string representation of integer, using digits from custom alphabet
# [val] a value which can be cast to integer
# [digits] a string or array of strings representing the custom digits
def self.custom_radix val, digits

digits = digits.to_a unless digits.respond_to? :[]
radix = digits.length
raise ArgumentError, "radix must have at least two digits" if radix < 2

i = val.to_i
out = []
begin
rem = i % radix
i /= radix
out << digits[rem..rem]
end until i == 0

out.reverse.join
end

# can be used as mixin, eg class Integer; include CustomRadix; end
# 32.custom_radix('abcd') => "caa" (200 base 4) equiv to 32.to_s(4).tr('0123','abcd')
def custom_radix digits
CustomRadix.custom_radix self, digits
end
end

example use:

$ irb
>> require '~/custom_radix'
=> true
>> CustomRadix.custom_radix(12345,'0'..'9')
=> "12345"
>> CustomRadix.custom_radix(12345,'.-')
=> "--......---..-"
>> funny_hex_digits = ('0'..'9').to_a + ('u'..'z').to_a
=> ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "u", "v", "w", "x", "y", "z"]
>> CustomRadix.custom_radix(255, funny_hex_digits)
=> "zz"
>> class Integer; include CustomRadix; end
=> Integer
>> (2**63).custom_radix(funny_hex_digits)
=> "8000000000000000"
>> (2**64+2**63+2**62).custom_radix(funny_hex_digits)
=> "1w000000000000000"
>> base64_digits = ('A'..'Z').to_a + ('a'..'z').to_a + ('0'..'9').to_a << '+' << '/'
=> ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "+", "/"]
>> 123456.custom_radix(base64_digits)
=> "eJA"

Ruby max integer

Ruby automatically converts integers to a large integer class when they overflow, so there's (practically) no limit to how big they can be.

If you are looking for the machine's size, i.e. 64- or 32-bit, I found this trick at ruby-forum.com:

machine_bytes = ['foo'].pack('p').size
machine_bits = machine_bytes * 8
machine_max_signed = 2**(machine_bits-1) - 1
machine_max_unsigned = 2**machine_bits - 1

If you are looking for the size of Fixnum objects (integers small enough to store in a single machine word), you can call 0.size to get the number of bytes. I would guess it should be 4 on 32-bit builds, but I can't test that right now. Also, the largest Fixnum is apparently 2**30 - 1 (or 2**62 - 1), because one bit is used to mark it as an integer instead of an object reference.

ruby TypeError - no implicit conversion of Fixnum into String?

This code is extremely risky and I can't see a reason for doing this in the first place. Remove the eval and you get this very ordinary code:

File.open('nagai.txt', 'a+') do |f|
f.puts parts[params[:salutation]]
end

The error comes from trying to concatenate a Fixnum/Integer to a String in the process of constructing the code you then eval. This code is invalid and yields the same error:

"1" + 1

Ruby isn't like other languages such as JavaScript, PHP or Perl which arbitrarily convert integers to strings and vice-versa. There's a hard separation between the two and any conversion must be specified with things like .to_s or .to_i.

That fixed version should be equivalent. If you need to defer this to some later point in time, you can write a method:

def write_nagai(params)
File.open('nagai.txt', 'a+') do |f|
f.puts parts[params[:salutation]]
end
end

What does the bracket operator do to a FixNum in Ruby?

You might be a little confused about what this is doing internally, but that's normal when dealing with Ruby because it's quite unlike other scripting languages. Unless you've used SmallTalk it might even seem insane.

When Ruby sees the following code:

x = 6
x[1]

What it's actually doing is this:

x.send(:[], 6) # Send :[] method call to x with arguments [ 6 ]

The x object is free to interpret that however it wants, and the behaviour is typically (though not always) defined by the class x belongs to if x is a normal instance.

In this case it's returning the bit at a given index, equivalent to x & (1 << 6) >> 6.

Sometimes the [] method does several things:

string = "brackets"

# Retrieve a single character
string[1]
# => "r"

# Retrieve a substring
string[5,2]
# => "et"

# Perform a pattern match
string[/[^aeiou]+/]
# => "br"

This also does some pretty crazy things since you can apply it to a Proc as well:

fn = lambda { |x| x + 1 }

# Conventional (explicit) call
fn.call(2)
# => 3

# Square bracket method
fn[5]
# => 6

Since Ruby leans very heavily on Duck Typing, this means that you can write a Proc to fill in where you'd normally have a Hash or an Array and the method receiving your object is none the wiser.

It's this flexibility in leaving the meaning of x[...] for your own class instance x up to you that makes it pretty powerful.

For example:

class Bracketeer
def [](i)
"%d brackets" % i
end
end

bracketeer = Bracketeer.new

bracketeer[6]
# => "6 brackets"

This simple notation often comes in handy when you're trying to create a minimal interface for a class of yours. In many cases you can use something simple like [] to replace what would be a more verbose method like find_by_id or cache_fetch.

Multiply all even indexed integers by two

To change it to an Array, you could do

a = 4408041234567901
arr = a.to_s.chars.map(&:to_i)
# => [4, 4, 0, 8, 0, 4, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1]

You can also multiply alternate numbers by 2

arr = a.to_s.chars.map.with_index {|n,i| i.even? ? n.to_i * 2 : n.to_i }
# => [8, 4, 0, 8, 0, 4, 2, 2, 6, 4, 10, 6, 14, 9, 0, 1]

Improving a little bit, you can use a Hash to find the number to be multiplied.

h = {true => 2, false => 1}
a.to_s.each_char.map.with_index {|n,i| n.to_i * h[i.even?]}

EDIT

I can explain each step, But it will be better if you can try to figure it out on your own. Open irb, type a.to_s and check the output. Then type a.to_s.chars and inspect the output and so on..

How are Symbols faster than Strings in Hash lookups?

There's no obligation for hash to be equivalent to object_id. Those two things serve entirely different purposes. The point of hash is to be as deterministic and yet random as possible so that the values you're inserting into your hash are evenly distributed. The point of object_id is to define a unique object identifier, though there's no requirement that these be random or evenly distributed. In fact, randomizing them is counter-productive, that'd just make things slower for no reason.

The reason symbols tend to be faster is because the memory for them is allocated once (garbage collection issues aside) and recycled for all instances of the same symbol. Strings are not like that. They can be constructed in a multitude of ways, and even two strings that are byte-for-byte identical are likely to be different objects. In fact, it's safer to presume they are than otherwise unless you know for certain they're the same object.

Now when it comes to computing hash, the value must be randomly different even if the string changes very little. Since the symbol can't change computing it can be optimized more. You could just compute a hash of the object_id since that won't change, for example, while the string needs to take into account the content of itself, which is presumably dynamic.

Try benchmarking things:

require 'benchmark'

count = 100000000

Benchmark.bm do |bm|
bm.report('Symbol:') do
count.times { :symbol.hash }
end
bm.report('String:') do
count.times { "string".hash }
end
end

This gives me results like this:

       user     system      total        real
Symbol: 6.340000 0.020000 6.360000 ( 6.420563)
String: 11.380000 0.040000 11.420000 ( 11.454172)

Which in this most trivial case is easily 2x faster. Based on some basic testing the performance of the string code degrades O(N) as the strings get longer but the symbol times remain constant.



Related Topics



Leave a reply



Submit