How to Redefine Fixnum's + (Plus) Method in Ruby and Keep Original + Functionality

How can I redefine Fixnum's + (plus) method in Ruby and keep original + functionality?

Use alias_method. Alias Fixnum's + to something else, then refer to it in the new +:

class Fixnum
alias_method :old_add, :+
def +(other)
self.old_add(other) * 2
end
end

Ruby division infinity/NaN should return 0

Very similar to: How can I redefine Fixnum's + (plus) method in Ruby and keep original + functionality?

class Float
alias_method :old_div, :/

def /(y)
return NAN if self == y && y == 0.0
return INFINITY if self == 1.0 && y == 0.0
self.old_div(y)
end
end

I know the code above might not be what you exactly want. Feel free to customize it the way you want =)

Ruby overload + operator

Overloaded operator is resolved based on the class of the first operand, so if you wanted to overload the addition of simple integers, something like that should work:

class Fixnum
def +(other)
return self * other
end
end

I do not recommend you to actually do this, btw.

Why can't ruby use most of the 2^X numbers as object ids?

First, it is very important to make a couple of things crystal clear:

  1. There are exactly two guarantees Ruby makes about object IDs. These two guarantees are the only thing you are allowed to rely on. You must not make any assumptions about object IDs other than these two guarantees:

    • An object has the same ID for its entire lifetime.
    • No two objects have the same ID at the same time.
    • [Note: this means in particular that different objects can have the same ID at different times, i.e. that IDs can be recycled.]
  2. An object ID is an opaque identifier. You must not make any assumptions about its structure or about any particular value.

  3. Any particular implementation of object IDs is a private internal implementation detail of a specific version of a specific implementation running in a specific environment at a specific moment. There is no guarantee that the results will be the same with a different implementation. There is no guarantee that the results will be the same with a different version of the same implementation. There is no guarantee that the results will be the same with the same version of the same implementation running in a different environment. In fact, there is not even a guarantee that the results will be the same between two runs of the same code on the same version of the same implementation in the same environment.

  4. ObjectSpace::_id2ref is an abomination. It should not even exist. It most certainly should not be used. It breaks object-orientation, it breaks encapsulation, it breaks safety.

Just as an example: unfortunately, you don't say which version of which implementation you are running in which environment. However, it looks like you are running YARV 2.6.3 in a 64-bit environment.

If you were to run that exact same code on the exact same version of YARV in a 32-bit environment, you would get different results. If you were to run that exact same code on an older version of YARV (pre-2.0) in the exact same environment, you would get different results.

Let's address the first, implicit, assumption which I think I see in your question. You seem to think that any ID should resolve to an object. It's easy to see that this cannot be true: there are infinitely many IDs, but for every run of a program, there are only finitely many objects, so there will always be infinitely many IDs which don't resolve to an object.

This already explains most of your results, namely the ones for 4, 16, 32, 64, 256, 512, and 1024.

So, with that out of the way, here's a high-level explanation of why there seems to some sort of structure to the IDs, and what that structure is. (But let me remind you again, that this explanation only applies to 64 bit systems, not to 32 bit, it only applies to YARV, it only applies to versions of YARV 2.0 or newer, and it is quite possible that it will no longer apply to YARV 3.0.)

In YARV, the developers made the decision that the object ID is the same thing as the memory address of the object header. This makes it easy to ensure the "rules" of object IDs: you can't have multiple objects at the same memory address at the same time, and an object will not change its memory address.

(Actually, it turns out that the second one is already a quite severe restriction: many modern high-performance garbage collectors depend on being able to move objects around in memory. This is not possible if you assume that object ID == memory address. Which means you will not be able to use any of those high-performance algorithms.)

On pretty much all modern machines, memory access is word-aligned. While it is possible to address individual bytes, that is generally slower or more awkward. So, we can basically assume that if we allocate memory, it will be aligned on a word-boundary. Which means that all memory addresses will be divisible by 8 on 64-bit systems and 4 on 32-bit systems, or in other words, that all memory addresses will end in 3 (64-bit) or 2 (32-bit) zero bits. Or, in other words: 87.5% (75%) of the address space are unused.

On the other hand, it would be quite a waste to represent Integers as a full-blown Ruby object:

  • They are immutable, which means we don't have to store any state.
  • They can't have instance variables, which means we don't have to store an instance variable table.
  • They can't have a singleton class, which means we don't have to store a __klass__ pointer.
  • They can't be extended.
  • And so on …

What this means, is that we can optimize the representation of Integers by not storing them as objects at all. All we need is some special case in the engine, so that if someone asks for the class of, say, 42, instead of trying to look at 42's __klass__ pointer, the engine "magically" knows to just return the Integer class.

Once we have that in place, we can do a really cool trick, which is actually as old as the very first LISP and Smalltalk VMs, and it is called a tagged pointer representation. Normally, the value of a variable is a pointer to the object (header), but with a tagged pointer representation we can store the value of the object inside the pointer to the object itself!

All we need to do is to have some sort of tag on the pointer that tells the engine that this is actually not a pointer but a value disguised as a pointer. In some older machines, especially those specifically designed for running high-level languages, pointers did have a tag field specifically for holding, e.g. type information or access control. Modern machines don't have that, but we have those unused bits we can (ab)use as tag bits.

And that is what YARV is doing: When the last bit of a pointer is 1, then it's not actually a pointer, it's an Integer. In particular, an Integer is encoded in YARV by shifting it one bit to the left and setting the last bit to 1. This allows us to encode a 63-bit Integer in a 64-bit pointer, and do native integer arithmetic at it with no object overhead and only a little bit of bit shifting overhead.

And if you think about what this encoding means:

  • shifting one bit to the left is equivalent to multiplying by two
  • setting the last bit to 1 is equivalent to incrementing by 1

Then you can explain the first pattern: a small Integer with value n is encoded as the "quasi-pointer" 2n + 1, and since "memory address" and object ID are the same in YARV (even though this is not actually a memory address, because there is no object which could have an address), it will have the object ID 2n + 1.

Integers that don't fit into 63 bit (31 bit), are allocated as objects like any other object. In different engines, these have different names, e.g. in the Smalltalk-80 VM, they are called SmallInts, in YARV, they are called Fixnums (and the ones that don't fit into a Fixnum are called Bignums). They actually used to be different subclasses of a fully-abstract Integer class in older versions of YARV, but this was considered a mistake. (It's really an internal optimization and should not be visible to the programmer.) In current versions of YARV, Fixnum and Bignum are aliases for Integer and using them gives a deprecation warning.

This explains your result for 1. If you had tried out ObjectSpace._id2ref(3), the result would have 1, then ObjectSpace._id2ref(5) would be 2, and so on.

And we still are using only 62.5% of the address space (on a 64-bit system)!

So, let's think about what else we might want to represent in this way.

YARV has a very similar optimization for Floats. Floating point numbers that fit into 62-bits are called flonums and are represented similar, with a tag of 10 at the end. (YARV does not use flonums on 32-bit platforms.)

This explains your result for ObjectSpace._id2ref(2). If you had tried ObjectSpace._id2ref(6), the result would have been -2.0.

And a similar trick is also played for Symbols. I won't explain it here in detail, because a) I don't actually fully know how it works, and b) it is slightly more complex, because the value being encoded isn't directly the Symbol value, rather it is an index into the Symbol table. However, that explains your result for 128.

Now, lastly, there is a completely different part of the address space that is also unused: the low addresses. On most modern Operating Systems, the low addresses are reserved for mapping the kernel memory directly into the user process in order to speed up the user space ↔︎ kernel space transition. Plus, there is another reason the very low addresses are kept free: in C, it is illegal to dereference a NULL pointer. Now, one way of implementing this, would be for the runtime to track all pointer dereferences and check whether they are dereferencing the NULL pointer. But there is an easier way: just give the NULL pointer an actual memory address, but one that is never allocated. That way, you don't have to do anything: if the code tries to dereference the pointer, the address doesn't exist, and the MMU will take care of raising an error. So, most C compilers compile the NULL pointer to the actual memory address 0, and in order to make sure that there is never any real data allocated at that address, they keep a whole area around address 0 free.

This means that the low addresses are never used, and we can (ab)use them to represent even more "interesting" objects. Now, YARV uses the very low addresses to represent the following objects:

  • false at address 0, which has the additional advantage that 0 is considered false in C.
  • nil at address 8 (4 in 32-bit).
  • true at address 20 (2 in 32-bit).
  • Qundef (a special internal value inside the engine that denotes an undefined value) at address 52 (6 in 32-bit).

And that explains your number 8.

This also means that your 4, 16, 32, 64, 256, 512, and 1024 will probably never resolve to an object, because they are in the low address range where the C library will simply never allocate memory.

As a closing remark, I want to repeat one last time that all of this is a private internal implementation detail of a specific version of YARV running in a specific environment. You must not rely on any of this, ever.

When flonums were introduced in YARV, and on some platforms nil no longer had object ID 4, this did break some code, and it did cause some confusion (as evidenced e.g. by questions on Stack Overflow), even though the YARV developers are allowed to change object IDs at will, because there are no guarantees being made about any particular ID values or the structure of IDs. Please, do not make the same mistake.

Besides dynamic typing, what makes Ruby more flexible than Java?

Dynamic typing doesn't come close to covering it. For one big example, Ruby makes metaprogramming easy in a lot of cases. In Java, metaprogramming is either painful or impossible.

For example, take Ruby's normal way of declaring properties:

class SoftDrink
attr_accessor :name, :sugar_content
end
# Now we can do...
can = SoftDrink.new
can.name = 'Coke' # Not a direct ivar access — calls can.name=('Coke')
can.sugar_content = 9001 # Ditto

This isn't some special language syntax — it's a method on the Module class, and it's easy to implement. Here's a sample implementation of attr_accessor:

class Module
def attr_accessor(*symbols)
symbols.each do |symbol|
define_method(symbol) {instance_variable_get "@#{symbol}"}
define_method("#{symbol}=") {|val| instance_varible_set("@#{symbol}", val)}
end
end
end

This kind of functionality allows you a lot of, yes, flexibility in how you express your programs.

A lot of what seem like language features (and which would be language features in most languages) are just normal methods in Ruby. For another example, here we dynamically load dependencies whose names we store in an array:

dependencies = %w(yaml haml hpricot sinatra couchfoo)
block_list %w(couchfoo) # Wait, we don't really want CouchDB!
dependencies.each {|mod| require mod unless block_list.include? mod}

How to write a switch statement in Ruby

Ruby uses the case expression instead.

case x
when 1..5
"It's between 1 and 5"
when 6
"It's 6"
when "foo", "bar"
"It's either foo or bar"
when String
"You passed a string"
else
"You gave me #{x} -- I have no idea what to do with that."
end

Ruby compares the object in the when clause with the object in the case clause using the === operator. For example, 1..5 === x, and not x === 1..5.

This allows for sophisticated when clauses as seen above. Ranges, classes and all sorts of things can be tested for rather than just equality.

Unlike switch statements in many other languages, Ruby’s case does not have fall-through, so there is no need to end each when with a break. You can also specify multiple matches in a single when clause like when "foo", "bar".



Related Topics



Leave a reply



Submit