How to Use Pointers in Ruby

Is it possible to use pointers in Ruby?

In Ruby, (almost) every variable is in fact a reference/pointer to an object, e.g.

a = [0, 1, 23]
b = a
a << 42
p b

will give [0, 1, 23, 42] because a and b are pointing to the same object.

So in fact, you are using pointers all the time.

If you want to do pointer arithmetic as in C, this is not possible with Ruby.

Ruby and pointers compared to other languages?

In general - ruby variables hold references, not pointers to the values. When you assign:

a = [3]
b = a

b references the same array as a. It's the same with strings.

Worth reading:
https://robertheaton.com/2014/07/22/is-ruby-pass-by-reference-or-pass-by-value/

EDIT:

a = [1]
b = a
b
# => [1]
a = [2]
a
# => [2]
b
# => [1]

Pointer in Ruby

All ruby variable references are essentially pointers (but not pointers-to-pointers), in C parlance.

You can mutate an object (assuming it's not immutable), and all variables that reference it will thus be pointing at the same (now mutated) object. But the only way to change which object a variable is referring to is with direct assignment to that variable -- and each variable is a separate reference; you can't alias a single reference with two names.

How to assign a pointer to a variable in Ruby

In Ruby every variable is just a pointer/reference to an instance, there isn't anything else.

a = { foo: "Hello" }
b = a[:foo]
b.gsub!('e', 'a')
puts a.inspect
# => { foo: "Hallo" }

So in your example b is not assigned a copy of "Hello", it really is a reference to the same instance stored in the hash.

When you assign to a variable though, you are replacing the reference/pointer stored in the variable with something else.

What you seem to try to do is go one level deeper: You want to have a pointer to a reference (kind of a P** in C, I guess). In Ruby you can solve this with any kind of additional object which can store your value. How about an array?

a = { foo: [1] }
b = a[:foo]
b[0] = 2
puts a.inspect
# => { foo: [2] }

ruby variable as same object (pointers?)

class Ref
  def initialize val
    @val = val
  end

  attr_accessor :val

  def to_s
    @val.to_s
  end
end

a = Ref.new(4)
b = a

puts a   #=> 4
puts b   #=> 4

a.val = 5

puts a   #=> 5
puts b   #=> 5

When you do b = a, b points to the same object as a (they have the same object_id).

When you do a = some_other_thing, a will point to another object, while b remains unchanged.

For Fixnum, nil, true and false, you cannot change the value without changing the object_id. However, you can change other objects (strings, arrays, hashes, etc.) without changing object_id, since you don't use the assignment (=).

Example with strings:

a = 'abcd'
b = a

puts a  #=> abcd
puts b  #=> abcd

a.upcase!          # changing a

puts a  #=> ABCD
puts b  #=> ABCD

a = a.downcase     # assigning a

puts a  #=> abcd
puts b  #=> ABCD

Example with arrays:

a = [1]
b = a

p a  #=> [1]
p b  #=> [1]

a << 2            # changing a

p a  #=> [1, 2]
p b  #=> [1, 2]

a += [3]          # assigning a

p a  #=> [1, 2, 3]
p b  #=> [1, 2]

How does Ruby differentiate VALUE with value and pointer?

There are many different implementations of Ruby. The Ruby Language Specification doesn't prescribe any particular internal representation for objects – why should it? It's an internal representation, after all!

For example, JRuby doesn't represent objects as C pointers at all, it represents them as Java objects. IronRuby represents them as .NET objects. Opal represents them as ECMAScript objects. MagLev represents them as Smalltalk objects.

However, there are indeed some implementations that use the strategy you describe. The now abandoned MRI did it that way, YARV and Rubinius also do it.

This is actually a very old trick, dating back to at least the 1960s. It's called a tagged pointer representation, and like the name suggests, you need to tag the pointer with some additional metadata in order to know whether or not it is actually a pointer to an object or an encoding of some other datatype.

Some CPUs have special tag bits specifically for that purpose. (For example, on the AS/400, the CPU doesn't even have pointers, it has 128bit object references, even though the original CPU was only 48bit wide, and the newer POWER-based CPUs 64 bit; the extra bits are used to encode all sorts of metadata like type, owner, access restrictions, etc.) Some CPUs have tag bits for other purposes that can be "abused" for this purpose. However, most modern mainstream CPUs don't have tag bits.

But, you can use a trick! On many modern CPUs, unaligned memory accesses (accessing an address that does not start at a word boundary) are really slow (on some, they aren't even possible at all), which means that on a 32bit CPU, all pointers that are realistically being used, end with two 00 bits and on 64 bit CPUs with three 000 bits. You can use these bits as tag bits: pointers that end with 00 are indeed pointers, pointers that end with 01, 10, or 11 are an encoding of some other data type.

In MRI, the pointers ending in 1 were used to encode 31/63 bit Fixnums. In YARV, they are used to encode 31/63 bit Fixnums, i.e. integers that are encoded as actual machine integers according to the formula 2n+1 (arithmetically speaking) or (n << 1) | 1 (as a bit pattern). On 64 bit platforms, YARV also uses pointers that end in 10 to encode 62 bit flonums using a similar scheme. (If you ever wondered why the object_id of a Fixnum in YARV is 2n+1, now you know: YARV uses the memory address for the object ID, and 2n+1 is the "memory address" of n.)

Now, what about nil, false and true? Well, there is no space for them in our current scheme. However, the very low memory addresses are usually reserved for the operating system kernel, which means that a pointer like 0 or 2 or 4 cannot realistically occur in a program. YARV uses that space to encode nil, false and true: false is encoded as 0 (which is convenient because that's also the encoding of false in C), nil is encoded as 0b1000 and true is encoded as 0b10100 (it used to be 0, 0b10 and 0b100 in older versions before the introduction of flonums).

Theoretically, there is a lot of space there to encode other objects as well, but YARV doesn't do that. Some Smalltalk or Lisp VMs, for example, encode ASCII or BMP Unicode character objects there, or some often used objects such as the empty list, empty array, or empty string.

There is still some piece missing, though: without an object header, with just the bare bit pattern, how can the VM access the class, the methods, the instance variables, etc.? Well, it can't. Those have to be special-cased and hardcoded into the VM. The VM simply has to know that a pointer ending in 1 is an encoded Fixnum and has to know that the class is Fixnum and the methods can be found there. And as for instance variables? Well, you could store them separately from the objects in a dictionary on the side. Or you go the Ruby route and simply disallow them altogether.

Ruby and pointers

Everything in Ruby is already a reference.

Why not just maintain a room index?

rooms[room.number] = room

Then you can get anything with rooms[i]. I would keep the index up to date incrementally by simply modifying the initialize method of Room.

def initialize
  rooms[self.number] = self
  . . .
end

This won't take up much space because the array is just an index, it doesn't actually have copies of the rooms. Each reference obtained from the array is essentially the same thing as a reference obtained via any other mechanism in your program, and the only real difference between the reference and a classic pointer is a bit of overhead for garbage collection.

If rooms are ever deleted (other than just before exit) you will want to set the rooms[x] = nil when on deletion.

I don't see why you need to create the data structure first and then index the rooms, but FWIW you should be able to do that recursive enumeration and use the rooms presence in the room index array as the been-here flag. I'm not sure why it didn't work before but it really has to if written carefully.

Ruby, pointer to method

So, basically you need ticket.request method to take also arguments you want to be passed to subsequent method call? It is possible, if so:

def ticket.request(request, *args)
  if respond_to?(request)
    send(request, *args)
  end
end
puts [
  ticket.price,            # direct ===> 3.5
  ticket.request("price"), # safe   ===> 3.5
  ticket.request("name")   # it does not exist ===> nil
  ticket.request("triple", 10) 
]

*args in method definition works as "take the rest of arguments and store them in args array. In the method call it works in quite opposite way - as "take args array and make it an argument list passed into the method".

Note that I deleted redundant self keywords, as self is default receiver of the message. I also used send method, as it's used more commonly.

How to use pointers to Ruby objects safely from within C-based extension?

The link that matt offers is really good. It would have saved me days if I had found it before.

You can keep references to ruby Strings and pointers into them. I would suggest freezing the String. Then every attempt to change the string will fail. There is a function Data_Wrap_Struct() that lets you wrap your own data structure into a Ruby object. Beside the data structure and the class of the structure, the function takes two function arguments. One of them (mark) is used to show the garbage collector where your structure references other ruby objects.

What took me some time to understand, is that the garbage collector is really scanning the stack of all ruby threads to seek for references to ruby objects. So keeping VALUEs on the stack is also a safe method to keep objects referenced.

Can this code be written in a way which is compatible with both MRI 1.8 and 1.9?

The basic API for extensions didn't change very much (I think) from 1.8 to 1.9. But I've used only 1.9 so far.

can I use malloc and free the same as I would in a "regular" C-based project?

Sure, I cannot think of any reason why this should not possible, as long as you don't expect the garbage collector to keep care of the allocated memory.

I had a hard time, mixing C++ code, compiled with another version of gcc than the version the ruby interpreter was compiled with. If you experience strange startup behavior, I would check for compiler version differences.