What Are the *Actual* Steps in Ruby's Method Lookup

Method lookup for class singleton methods in Ruby

When subclassing, not only is Bar.superclass set to Foo, but the same holds true for the singleton classes:

Bar.singleton_class.superclass == Foo.singleton_class  # => true

So you're not really confused. The actual lookup is:

  1. Start with obj's singleton class.
  2. Look for instance methods down the ancestor list:
    • prepended modules (Ruby 2.0)
    • the class itself
    • included modules
  3. Repeat #2 with superclass.
  4. Repeat #1 but this time looking for method_missing

Ruby method lookup path for an object

That other post makes it seem confusing, but it really isn't. If you are interested in such things, you should read "Metaprogramming Ruby". Until then, the basic rule is one step to the right and up:

          Object (superclass)
^
|
Parent class A(superclass)
^
|
Parent class B(superclass)
^
|
obj -> object's class

2) Singleton classes are inserted between the obj and the object's class:

          Object
^
|
Parent class A(superclass)
^
|
Parent class B(superclass)
^
|
object's class(superclass)
^
|
obj -> obj's singleton_class

3) Included modules are inserted immediately above the class that does the including:

          Object
^
|
Parent class A
^
|
Module included by Parent Class B
^
|
Parent class B
^
|
object's class
^
|
obj -> obj's singleton_class

Edit:

Please point out any flaws

p method_lookup_chain(Class)

--output:--
[#<Class:Class>, #<Class:Module>, #<Class:Object>, #<Class:BasicObject>]

But...

class Object
def greet
puts "Hi from an Object instance method"
end
end

Class.greet

--output:--
Hi from an Object instance method

And..

class Class
def greet
puts "Hi from a Class instance method"
end
end

Class.greet

--output:--
Hi from a Class instance method

The lookup path for a method called on a class actually continues past BasicObject's singleton class(#<Class:BasicObject>):

class BasicObject
class <<self
puts superclass
end
end

--output:--
Class

The full lookup path for a method called on Class looks like this:

                  Basic Object                 
^
|
Object
^
|
Module
^
|
Class
^
|
BasicObject BasicObject's singleton class
| ^
| |
Object Object's singleton class
| ^
| |
Module Module's singleton class
| ^
| |
Class ---> Class's singleton class

The lookup starts in Class's singleton class and then goes up the hierarchy on the right. "Metaprogramming Ruby" claims there is a unified lookup theory for all objects, but the lookup for methods called on a class does not fit the diagram in 3).

You have the same problem here:

class A 
end

class B < A
end

p method_lookup_chain(B)

--output:--
[#<Class:B>, #<Class:A>, #<Class:Object>, #<Class:BasicObject>]

It should be this:

                  Basic Object                 
^
|
Object
^
|
Module
^
|
Class
^
|
BasicObject BasicObject's singleton class
| ^
| |
Object Object's singleton class
| ^
| |
A A's singleton class
| ^
| |
B.greet --> B's singleton class

One thing you need to keep in mind: the lookup path of any method called on a class has to include Class somewhere because ALL classes inherit from Class.

Ruby method lookup in regards to globals/scopes

Ruby's method lookup algorithm is actually really simple:

  • retrieve the class pointer of the receiver
  • if the method is there, invoke it
  • otherwise retrieve the superclass pointer, and repeat

That's it.

If the algorithm comes to a point where there is no more superclass, but it still hasn't found the method yet, it will restart the whole process again, with method_missing as the message and the name of the original message prepended to the arguments. But that's it. That's the whole algorithm. It is very small and very simple, and it has to be very small and very simple because method lookup is the single most often executed operation in an object-oriented language.

Note: I am completely ignoring Module#prepend / Module#prepend_features since I just don't know enough about how it works. I only know what it does, and that's good enough for me.

Also note: I am ignoring performance optimizations such as caching the result of a method lookup in something like a Polymorphic Inline Cache.

Okay, but here's the trick: where exactly do those class and superclass pointers point to? Well, they do not point to what the Object#class and Class#superclass methods return. So, let's step back a little.

Every object has a class pointer that points to the class of the object. And every class has a superclass pointer that points to its superclass.

Let's start a running example:

class Foo; end

Now, we have class Foo, and its superclass pointer points to Object.

foo = Foo.new

And our object foo's class pointer points to Foo.

def foo.bar; end

Now things start to get interesting. We have created a singleton method. Well, actually, there is no such thing as a singleton method, it's really just a normal method in the singleton class. So, how does this work? Well, now the class pointer points to foo's singleton class and foo's singleton class's superclass pointer points to Foo! In other words, the singleton class was inserted in between foo and its "real" class Foo.

However, when we ask foo about its class, it still responds Foo:

foo.class #=> Foo

The Object#class method knows about singleton classes, and simply skips over them, following the superclass pointer until it finds a "normal" class, and returns that.

Next complication:

module Bar; end

class Foo
include Bar
end

What happens here? Ruby creates a new class (let's call it Barʹ), called an include class. This class's method table pointer, class variable table pointer, and constant table pointer point to Bar's method table, class variable table, and constant table. Then, Ruby makes Barʹ's superclass pointer point to Foo's current superclass, and then makes Foo's superclass pointer point to Barʹ. In other words, including a module creates a new class that gets inserted as the superclass of the class the module is included into.

There's a slight complication here: you can also include modules into modules. How does that work? Well, Ruby simply keeps track of the modules that were included into a module. And then, when the module is included into a class, it will recursively repeat the steps above for every included module.

And that's all you need to know about the Ruby method lookup:

  • find the class
  • follow the superclass
  • singleton classes insert above objects
  • include classes insert above classes

Now let's look at some of your questions:

When calling methods explicitly on a class, there are lots of illustrations on the order in which the classes, and modules included by them are searched (and thus exactly what super calls in each case). But when not explicitly calling a method, e.g. a plain func args rather than self.func args what is the search order?

The same. self is the implicit receiver, if you don't specify a receiver, the receiver is self. And parentheses are optional. In other words:

func args

is exactly the same as

self.func(args)

Why does in my example below, the member method calling func find the member method before the global, but func2 finds the global without method_missing being called?

There is no such thing as a "global method" in Ruby. There is also no such thing as a "member method". Every method is an instance method. Period. There are no global, static, class, singleton, member methods, procedures, functions, or subroutines.

A method defined at the top-level becomes a private instance method of class Object. Test inherits from Object. Run the steps I outlined above, and you will find exactly what is going on:

  • Retrieve x's class pointer: Test
  • Does Test have a method called func: Yes, so invoke it.

Now again:

  • Retrieve x's class pointer: Test
  • Does Test have a method called func2: No!
  • Retrieve Test's superclass pointer: Object
  • Does Object have a method called func2: Yes, so invoke it.

And when the global is instead an module/class/type, why is the member with the same name not found at all?

Again, there is no global here, there are no members here. This also doesn't have anything to do with modules or classes. And Ruby doesn't have (static) types.

Math

is a reference to a constant. If you want to call a method with the same name, you have to ensure that Ruby can tell that it's a method. There are two things that only methods can have: a receiver and arguments. So, you can either add a receiver:

self.Math

or arguments:

Math()

and now Ruby knows that you mean the method Math and not the constant Math.

The same applies to local variables, by the way. And to setters. If you want to call a setter instead of assigning a local variable, you need to say

self.func = 'setter method'

Ruby Method Lookup (comparison with JavaScript)

  1. Does obj1 and obj2 in the Ruby code each own a copy of the some_method method? Or is it similar to JavaScript where both objects have access to some_method via another object (in this case, via the Child class)?

You don't know. The Ruby Language Specification simply says "if you do this, that happens". It does, however, not prescribe a particular way of making that happen. Every Ruby implementation is free to implement it in the way it sees fit, as long as the results match those of the spec, the spec doesn't care how those results were obtained.

You can't tell. If the implementation maintains proper abstraction, it will be impossible for you to tell how they do it. That is just the nature of abstraction. (It is, in fact, pretty much the definition of abstraction.)


  1. Similarly, when inheritance is taken into account in Ruby, does each Ruby object have a copy of all of the class and superclass methods of the same name?

Same as above.

There are a lot of Ruby implementations currently, and there have been even more in the past, in various stages of (in)completeness. Some of those implement(ed) their own object models (e.g. MRI, YARV, Rubinius, MRuby, Topaz, tinyrb, RubyGoLightly), some sit on top of an existing object model into which they are trying to fit (e.g. XRuby and JRuby on Java, Ruby.NET and IronRuby on the CLI, SmallRuby, smalltalk.rb, Alumina, and MagLev on Smalltalk, MacRuby and RubyMotion on Objective-C/Cocoa, Cardinal on Parrot, Red Sun on ActionScript/Flash, BlueRuby on SAP/ABAP, HotRuby and Opal.rb on ECMAScript)

Who is to say that all of those implementations work exactly the same?

My gut tells me that Ruby objects DO NOT have separate copies of the methods inherited from their class, mixed-in modules, and superclasses. Instead, my gut is that Ruby handles method lookup similarly to JavaScript, where objects check if the object itself has the method and if not, it looks up the method in the object's class, mixed-in modules, and superclasses until the lookup reaches BasicObject.

Despite what I wrote above, that is a reasonable assumption, and is in fact, how the implementations that I know about (MRI, YARV, Rubinius, JRuby, IronRuby, MagLev, Topaz) work.

Just think about what it would mean if it weren't so. Every instance of the String class would need to have its own copy of all of String's 116 methods. Think about how many Strings there are in a typical Ruby program!

ruby -e 'p ObjectSpace.each_object(String).count'
# => 10013

Even in this most trivial of programs, which doesn't require any libraries, and only creates one single string itself (for printing the number to the screen), there are already more than 10000 strings. Every single one of those would have its own copies of the over 100 String methods. That would be huge waste of memory.

It would also be a synchronization nightmare! Ruby allows you to monkeypatch methods at any time. What if I redefine a method in the String class? Ruby would now have to update every single copy of that method, even across different threads.

And I actually only counted public methods defined directly in String. Taking into account private methods, the number of methods is even bigger. And of course, there is inheritance: strings wouldn't just need a copy of every method in String, but also a copy of every method in Comparable, Object, Kernel, and BasicObject. Can you imagine every object in the system having a copy of require?

No, the way it works in most Ruby implementations is like this. An object has an identity, instance variables, and a class (in statically typed pseudo-Ruby):

struct Object
object_id: Id
ivars: Dictionary<Symbol, *Object>
class: *Class
end

A module has a method dictionary, a constant dictionary, and a class variable dictionary:

struct Module
methods: Dictionary<Symbol, *Method>
constants: Dictionary<Symbol, *Object>
cvars: Dictionary<Symbol, *Object>
end

A class is like a module, but it also has a superclass:

struct Class
methods: Dictionary<Symbol, *Method>
constants: Dictionary<Symbol, *Object>
cvars: Dictionary<Symbol, *Object>
superclass: *Class
end

When you call a method on an object, Ruby will look up the object's class pointer and try to find the method there. If it doesn't, it will look at the class's superclass pointer and so on, until it reaches a class which has no superclass. At that point it will actually not give up, but try to call the method_missing method on the original object, passing the name of the method you tried to call as an argument, but that's just a normal method call, too, so it follows all the same rules (except that if a call to method_missing reaches the top of the hierarchy, it will not try to call it again, that would result in an infinite loop).

Oh, but we ignored one thing: singleton methods! Every object needs to have its own method dictionary as well. Actually, rather, every object has its own private singleton class in addition to its class:

struct Object
object_id: Id
ivars: Dictionary<Symbol, *Object>
class: *Class
singleton_class: Class
end

So, method lookup starts first in the singleton class, and only then goes to the class.

And what about mixins? Oh, right, every module and class also needs a list of its included mixins:

struct Module
methods: Dictionary<Symbol, *Method>
constants: Dictionary<Symbol, *Object>
cvars: Dictionary<Symbol, *Object>
mixins: List<*Module>
end

struct Class
methods: Dictionary<Symbol, *Method>
constants: Dictionary<Symbol, *Object>
cvars: Dictionary<Symbol, *Object>
superclass: *Class
mixins: List<*Module>
end

Now, the algorithm goes: look first in the singleton class, then the class and then the superclass(es), where however, "look" also means "after you look at the method dictionary, also look at all the method dictionaries of the included mixins (and the included mixins of the included mixins, and so forth, recursively) before going up to the superclass".

Does that sound complicated? It is! And that's not good. Method lookup is the single most often executed algorithm in an object-oriented system, it needs to be simple and lightning fast. So, what some Ruby implementations (e.g. MRI, YARV) do, is to decouple the interpreter's internal notion of what "class" and "superclass" mean from the programmer's view of those same concepts.

An object no longer has both a singleton class and a class, it just has a class:

struct Object
object_id: Id
ivars: Dictionary<Symbol, *Object>
class: *Class
singleton_class: Class
end

A class no longer has a list of included mixins, just a superclass. It may, however, be hidden. Note also that the Dictionaries become pointers, you'll see why in a moment:

struct Class
methods: *Dictionary<Symbol, *Method>
constants: *Dictionary<Symbol, *Object>
cvars: *Dictionary<Symbol, *Object>
superclass: *Class
visible?: Bool
end

Now, the object's class pointer will always point to the singleton class, and the singleton class's superclass pointer will always point to the object's actual class. If you include a mixin M into a class C, Ruby will create a new invisible class M′ which shares its method, constant and cvar dictionaries with the mixin. This mixin class will become the superclass of C, and the old superclass of C will become the superclass of the mixin class:

M′ = Class.new(
methods = M->methods
constants = M->constants
cvars = M->cvars
superclass = C->superclass
visible? = false
)

C->superclass = *M'

Actually, it's little bit more involved, since it also has to create classes for the mixins that are included in M (and recursively), but in the end, what we end up with is a nice linear method lookup path with no side-stepping into singleton classes and included mixins.

Now, the method lookup algorithm is just this:

def lookup(meth, obj)
c = obj->class

until res = c->methods[meth]
c = c->superclass
raise MethodNotFound, meth if c.nil?
end

res
end

Nice and clean and lean and fast.

As a trade-off, finding out the class of an object or the superclass of a class is slightly more difficult, because you can't simply return the class or superclass pointer, you have to walk the chain until you find a class that is not hidden. But how often do you call Object#class or Class#superclass? Do you even call it at all, outside of debugging?

Unfortunately, Module#prepend doesn't fit cleanly into this picture. And Refinements really mess things up, which is why many Ruby implementations don't even implement them.

How does Ruby solve the method lookup in 'diamond' mixin?

My understanding is that:

  • include and prepend establishes a partial relation (ancestor relation) among the relevant modules/classes at each step, and
  • An ancestor relation cannot be contradicted in a later step.

Let us start with:

module M1; end
module M2; end
module M3; end
class Base; end

and follow each step. The first three steps should be trivial:

Base.ancestors
#=> [Base, Object, Kernel, BasicObject]

M1.prepend M3
M1.ancestors
# => [M3, M1]

M2.prepend M3
M2.ancestors
#=> [M3, M2]

Now, your crucial first step is Base.include M1. This inserts the ancestors of M1 (the whole [M3, M1] chunk) just to the right of Base before Object:

Base.include M1
Base.ancestors
#=> [Base, M3, M1, Object, Kernel, BasicObject]

The next step is Base.prepend M2. This attempts to insert the ancestors of M2 (the whole [M3, M2] chunk) just to the left of Base. But notice that that would cause a contradictory relation between Base and M3.

Base.prepend M2
Base.ancestors
#=> Cannot be [M3, M2, Base, M3, M1, Object, Kernel, BasicObject]

Since it has already been established that M3 appears on the right side of Base, the best it can do to place [M3, M2] is to place it on the right side of Base:

Base.prepend M2
Base.ancestors
#=> [Base, M3, M2, M1, Object, Kernel, BasicObject]

It may appear that placing M2 on the right side of Base contradicts the intention of Base.prepend M2. But that can be canceled/modified to fit on the spot, whereas an already established relation between Base and M3 cannot be canceled in a later spot.

In fact, when there is no way to satisfy the relations already established, then an error is raised:

module M4; end
module M5; end
M4.include M5
M5.include M4 #>> ArgumentError: cyclic include detected

objects' method lookup path (modules & method override)

I'm not 100% I understand your question but I try my best...

Ruby actually remembers which modules are included into a class and merges these modules into the lookup path for methods. You can ask a class for its included methods my using A.included_modules

Methods from included modules are put on top of the modules defined in the current class. Please see this example:

class X
def foo
puts 'Hi from X'
end
end

module AnotherFoo
def foo
puts "Hi from AnotherFoo"
super
end
end

class Y < X
include AnotherFoo
end

y = Y.new
y.foo
# Hi from another foo
# Hi from X

class Y < X
include AnotherFoo
def foo
puts "Hi from Y"
super
end
end

y.foo
# Hi from Y
# Hi from another foo
# Hi from X

You can see the method resolution order: the child class -> the included modules -> the parent class. You can also see that modules are always included only once. So if a class already includes a module, it will not be re-included again.

Nested singleton class method lookup

Much of this explanation is based on How Ruby Method Dispatch Works by James Coglan, a little of the Ruby Hacking Guide, and just a smidge of source.

To begin with a summary, the ancestry looks like this:

                                                           +----------------+
| |
+--------------------------- Module ~~~~~~~~~~~~~~> #<Class:Module> |
| ^ ^ |
| | | |
| Class ~~~~~~~~~~~~~~~> #<Class:Class> |
| ^ ^ |
| | | |
| BasicObject ~~~~~> #<Class:BasicObject> ~~> #<Class:#<Class:BasicObject>> |
| ^ ^ ^ |
| | Kernel | | |
| | ^ | | |
| | | | +-----------------------|----------------+
| +-----+----+ | | |
| | | v |
+-------> Object ~~~~~~> #<Class:Object> ~~~~~~~~> #<Class:#<Class:Object>>
^ ^ ^
| | |
Foo ~~~~~~~~> #<Class:Foo> ~~~~~~~~~~> #<Class:#<Class:Foo>>

---> Parent
~~~> Singleton class

Let's start from the beginning and build out. BasicObject is the root of everything - if you check BasicObject.superclass, you get nil. BasicObject is also an instance of Class. Yes, that gets circular, and there's a special case in the code to deal with it. When A is an instance of B, A.singleton_class is a child of B, so we get this:

                           Class
^
|
BasicObject ~~~~~> #<Class:BasicObject>

Object inherits from BasicObject. When A inherits from B, A is a child of B and A.singleton_class is a child of B.singleton_class. Object also includes Kernel. When A includes B, B is inserted as the first ancestor of A (after A itself, but before A.superclass).

                           Class
^
|
BasicObject ~~~~~> #<Class:BasicObject
^ ^
| Kernel |
| ^ |
| | |
+-----+----+ |
| |
Object ~~~~~~> #<Class:Object>

Kernel is an instance of Module. It's the only instance of Module we'll see, and its singleton class doesn't appear in any ancestry chains, so I won't draw beyond it.

Now we get down to Foo, which inherits from Object (though you don't need to write < Object). We can already figure out what Foo and its singleton class are children of.

                           Class
^
|
BasicObject ~~~~~> #<Class:BasicObject>
^ ^
| Kernel |
| ^ |
| | |
+-----+----+ |
| |
Object ~~~~~~> #<Class:Object>
^ ^
| |
Foo ~~~~~~~~> #<Class:Foo>

Now Class inherits from Module, and Module inherits from Object, so add Module and the appropriate singleton classes. Because Module < Object and Object < BasicObject and BasicObject.instance_of?(Class), this is where the drawing gets a little funky. Remember you just stop traversing upwards whenever you hit BasicObject.

                                                           +----------------+
| |
+--------------------------- Module ~~~~~~~~~~~~~~> #<Class:Module> |
| ^ ^ |
| | | |
| Class ~~~~~~~~~~~~~~~> #<Class:Class> |
| ^ |
| | |
| BasicObject ~~~~~> #<Class:BasicObject> |
| ^ ^ |
| | Kernel | |
| | ^ | |
| | | | +----------------------------------------+
| +-----+----+ | |
| | | v
+-------> Object ~~~~~~> #<Class:Object>
^ ^
| |
Foo ~~~~~~~~> #<Class:Foo>


Related Topics



Leave a reply



Submit