What Is the Significance of the New Reference Classes

What is the significance of the new Reference Classes?

The request for documentation for ReferenceClasses comes up every now and then, for example on the r-devel list. The best answer so far is to actually look at what help(ReferenceClasses) gives you---which is a pretty decent start.

Then there are a few presentations:

John's presentation in November 2010 at Stanford
Martin Morgan in November 2010 from the BioConductor Europe meetings
Romain and myself mention it in the Google Tech Talk on Rcpp

And as noted by mdsummer in the comment, R5 was a short-lived joke. There was already another R-related project called R5, and John much prefers ReferenceClasses. And I reckon they are here to stay. People use them already as e.g. Jeff Horner is his new Rack package.

why do I need to create a reference of A class, and then an object of B class

The usage of "Human" and "Alien" is terrible here. Instead of "Human", think "Animal". Instead of "Alien", think "Dog".

The terminology isn't great either. The "Object" is the literal object itself: the physical Dog <-> the bits associated with it in memory. The "Reference" is the variable, h. It references the object Dog. h is not a "reference of Animal and object of Dog", as the video says with Human/Alien. It's a "reference to a Dog object". However, the variable "h" itself, it not forced to reference only Dogs. In fact, it can reference any Animal.

For example, I can write the code:

Animal mypet = new Dog();
mypet = new Cat();

If I wrote the line Dog mypet, then I would be forced to only write mypet = new Dog() or mypet = getDogFromShelter(myNeighborhoodShelter). It would not let me write mypet = new Cat().

Cats are cool, so that would be terrible. Hence, we write Animal mypet to allow the variable mypet reference any animal. Dog, Cat, Elephant will all be available. However, because of this restriction, I am not allowed to do any Dog-specific things to mypet.

mypet.bark() will not work if mypet is an Animal. Not all Animals can bark. mypet.eat(Food) will work, since all Animals can eat. If I want my pet to Bark, because I know it is a Dog right now, then I can do

((Dog)mypet)).bark(); // Will throw run-time error if mypet is not a Dog! // This is best to avoid, so just make mypet a Dog type if it must bark. // If you must make an Animal bark, use if (!(mypet instanceof Dog)) to handle the error propely.

This above code will check to make sure mypet is a dog before letting it bark.

This can be implemented in code by writing

class Animal {
    int health = 100;
    void eat(Food f) {
        health += f.value;
    }
}

class Dog extends Animal { // States that "All Dogs are Animals"
    // The word "extends" allows you to write Animal a = new Dog();
    // "extends" also lets you do "Dog a = new Dog(); a.eat()"
    int health = 150; // Dogs are strong
    void bark() {
        scareNearbyAnimals();
    }
}

class Poodle extends Dog {
    // Both of these will work:
    // Dog mydog = new Poodle();
    // Animal mypet = new Poodle();
    int glamor = 50; // glamorous
}

The video mixed up Object vs Reference, so I'll make it more explicit with the following code

Dog a = new Dog(); b = a;

a and b both reference the same object in this instance. If Dog uses a lot of memory, then b = a does not cause more memory to be allocated.

b.hasEaten(); // False a.eat(); b.hasEaten(); // True b = new Dog(); // Now they are different. a does not affect b

a.eat() allowed the object to eat. The bits in memory have changed: the hunger value has been reset. b.hasEaten() checks the hunger value of the same Dog that a used when it was eating. b = new Dog() will separate them, so that a and b reference distinct dog objects. They will then no longer coupled as they were before.

What are classes, references, and objects?

If you like housing metaphors:

a class is like the blueprint for a house. Using this blueprint, you can build as many houses as you like.
each house you build (or instantiate, in OO lingo) is an object, also known as an instance.
each house also has an address, of course. If you want to tell someone where the house is, you give them a card with the address written on it. That card is the object's reference.
If you want to visit the house, you look at the address written on the card. This is called dereferencing.

You can copy that reference as much as you like, but there's just one house -- you're just copying the card that has the address on it, not the house itself.

In Java, you can not access objects directly, you can only use references. Java does not copy or assign objects to each other. But you can copy and assign references to variables so they refer to the same object. Java methods are always pass-by-value, but the value could be an object's reference. So, if I have:

Foo myFoo = new Foo();     // 1
callBar(myFoo);            // 2
myFoo.doSomething()        // 4

void callBar(Foo foo) {
    foo = new Foo();       // 3
}

Then let's see what's happening.

Several things are happening in line 1. new Foo() tells the JVM to build a new house using the Foo blueprint. The JVM does so, and returns a reference to the house. You then copy this reference to myFoo. This is basically like asking a contractor to build you a house. He does, then tells you the house's address; you write this address down.
In line 2, you give this address to another method, callBar. Let's jump to that method next.
Here, we have a reference Foo foo. Java is pass-by-value, so the foo in callBar is a copy of the myFoo reference. Think of it like giving callBar its very own card with the house's address on it. What does callBar do with this card? It asks for a new house to be built, and then uses the card you gave it to write that new house's address. Note that callBar now can't get to the first house (the one we built in line 1), but that house is unchanged by the fact that a card that used to have its address on it, now has some other house's address on it.
Back in the first method, we dereference myFoo to call a method on it (doSomething()). This is like looking at the card, going to the house whose address is on the card, and then doing something in that house. Note that our card with myFoo's address is unchanged by the callBar method -- remember, we gave callBar a copy of our reference.

The whole sequence would be something like:

Ask JVM to build a house. It does, and gives us the address. We copy this address to a card named myFoo.
We invoke callBar. Before we do, we copy the address written on myfoo to a new card, which we give to callBar. It calls that card foo.
callBar asks the JVM for another house. It creates it, and returns the new house's address. callBar copies this address to the card we gave it.
Back in the first method, we look at our original, unchanged card; go to the house whose address is on our card; and do something there.

What is the meaning of Reference of type base and object of derive class

If you take a look at the classes (not their instances), then I would rather draw the picture like this:

Sample Image

Which means the Dog class has usually more methods and properties than the Animal class (for instance, a dog can bark (method) and has four legs (property)). And of course, additional memory has to be reserved when this class is instantiated. Imagine that the base classes methods and properties are created first, then the derived methods and properties are created in memory:

class Dog : Animal 
{ 
    public Dog() 
    { 
        legs = 4;
        Console.WriteLine("Dog constructor"); 
    } 


    public int legs { get; private set; }

    public void bark()
    {
         Console.WriteLine("grrrwoof!"); 
    }
}

If you instantiate a Dog and assign it to an Animal reference variable as you did, then this reference can only access the methods an Animal has. Despite of this fact, the entire Dog object is still kept in memory:

Dog d = new Dog();
Animal a = (Animal)d;

In other words, d is able to do the following:

Console.WriteLine(String.Format("Number of legs: {0}", d.legs.ToString())); 
d.bark();

but a can't do that, because those "features" are not defined within the Animal class.

What is now important to know is that not all kinds of casts are allowed. It is always allowed to cast from a Dog to an Animal, because this is safe, but you can't cast an Animal to a Dog implicitly, so the follwing code throws an invalid cast exception:

Dog dogRef2 = a; // not allowed

If you know what you're doing (i.e. if you know for sure that a contains an instance of Dog), then you are allowed to cast explicitly as follows:

Dog dogRef2 = (Dog)a; // allowed

and you can access the properties and methods afterwards:

dogRef2.bark(); // works

This works, because the compiler and the runtime always store the methods and properties in the same structured way in memory and also create an internal descriptor to find it when it is referenced.

Note that this isn't always safe, for instance if you try the following:

Animal a = new Animal();
Dog dogRef2 = (Dog)a; // Invalid cast exception

Why? Because new Animal() hasn't created the method bark and the property legs, it has just created an instance of Animal (contains neither the property legs nor the method bark).

More info: If you want to find out more about the internal structure (how objects are created and stored), check out this link. Here is an example for a memory layout, taken from there:
EEClasses

You can see that linked lists are used to build up the chain from the base classes instance objects to the derived classes instance objects.

Inheritance Reference Classes

Use named arguments rather than parsing ...; make sure that the
default constructor works when invoked with no arugments

Part.initialize<-function(..., var1=vector(), var2=character()){
    callSuper(..., var1=var1, var2=as.character(var2))
}

Part<-setRefClass(Class = "Part"
                 ,fields = c(var1 = "ANY", var2 = "character")
                 ,methods = list(initialize=Part.initialize))

Only interpret arguments for the class under consideration

A.initialize<-function(..., var3=list()){
    callSuper(..., var3=as.list(var3))
}

A<-setRefClass(Class = "A"
               ,contains = "Part"
               ,fields = list(var3 = "list")
               ,methods = list(initialize=A.initialize))

Simple test cases

Part()
A()
A(var3=list(a=1))

Non-standard set-functions in R Reference Classes

I don't think you can do this with your desired syntax.

Note that you will get the same error if you run any assignment like that, e.g.

a_person$hello("first") <- "John"

so it's really a basic problem.

What does work, is the following syntax:

name(a_person, "first") <- "John"

Altogether you could then have something like below:

PersonRCGen <- setRefClass("Person",
                  fields = list(
                    fullname = "list",
                    gender = "character"
                  ),
                  methods = list(
                    initialize = function(...) {
                      initFields(...)
                    },
                    name = function(x) {
                      .self$fullname[[x]]
                    }
                  )
)

setGeneric("name<-", function(x, y, value) standardGeneric("name<-"))
setMethod("name<-", sig = "ANY", function(x, y, value) {
  UseMethod("name<-")
})
# some extras
"name<-.default" <- function(x, y, value) {
  stop(paste("name assignment (name<-) method not defined for class", class(x)))
}
"name<-.list" <- function(x, y, value) {
  x[[y]] <- value
  return(x)
}
# and here specifically
"name<-.Person" <- function(x, y, value) {
  x$fullname[[y]] <- value
  return(x)
}

# example to make use of the above
a_person <- PersonRCGen$new(
  fullname = list(
    first = "Jane",
    last = "Doe"
  ),
  gender = "F"
)

a_person$name("first")
#> [1] "Jane"
name(a_person, "middle") <- "X."
a_person$name("middle")
#> [1] "X."

I'm aware this is not exactly what you want but I hope it helps.

Constructor reference classes

Do not write an initialize method, write a public constructor
(.Part: a constructor to be used by your code only, not the user). The public constructor's job is to transform user arguments to a consistent form for class methods

.Part<-setRefClass(Class = "Part"
                  ,fields = c(var1 = "ANY", var2 = "character"))

Use setOldClass to enable class dispatch

setOldClass(c("XMLInternalElementNode", "XMLInternalNode",
              "XMLAbstractNode"))

Write your public constructor as an S4 generic and methods

setGeneric("Part", function(x, ...) standardGeneric("Part"))

setMethod("Part", "missing", function(x, ...) {
    .Part()
})

setMethod("Part", "XMLInternalNode", function(x, ...) {
    attr<-xmlAttrs(x)
    var1 <- if (!is.na(attr["var1"])) attr["var1"] else vector()
    var2 <- if (!is.na(attr["var2"])) attr["var2"] else character()
    .Part(var1=var1, var2=var2, ...)
})

setMethod("Part", "ANY", function(x, var2, ...) {
    .Part(var1=x, var2=var2, ...)
})

Add a copy constructor if desired

setMethod("Part", "Part", function(x, ...) x$copy())

or if your own initialize method does something additional and conforms to the contract of the default initialize method (which acts as a copy constructor too) use

setMethod("Part", "Part", function(x, ...) .Part(x, ...))

Add any common code shared by constructors to the initialize method, being sure that your initialize method acts as a copy constructor and works when invoked without any arguments.

Make sure that simple test cases work

library(XML)
Part()
Part(TRUE, "var2")
txt <- "<doc> <part var2=\"abc\"/> </doc>"
node <- xmlTreeParse(txt, useInternalNodes = TRUE)[["//part"]]
p1 <- Part(node)
p2 <- Part(p1)
p1$var2 <- "xyz"
p2$var2            ## "abc"

What Is the Significance of the New Reference Classes