What Is "{" Class in R

Types and classes of variables

In R every "object" has a mode and a class. The former represents how an object is stored in memory (numeric, character, list and function) while the later represents its abstract type. For example:

d <- data.frame(V1=c(1,2))
class(d)
# [1] "data.frame"
mode(d)
# [1] "list"
typeof(d)
# list

As you can see data frames are stored in memory as list but they are wrapped into data.frame objects. The latter allows for usage of member functions as well as overloading functions such as print with a custom behavior.

typeof(storage.mode) will usually give the same information as mode but not always. Case in point:

typeof(c(1,2))
# [1] "double"
mode(c(1,2))
# [1] "numeric"

The reasoning behind this can be found here:

The R specific function typeof returns the type of an R object

Function mode gives information about the mode of an object in the sense of Becker, Chambers & Wilks (1988), and is more compatible with other implementations of the S language

The link that I posted above also contains a list of all native R basic types (vectors, lists etc.) and all compound objects (factors and data.frames) as well as some examples of how mode, typeof and class are related for each type.

How to extract just one of classes of object with multiple classes

class() results in an unnamed character vector, which you usually subset using numeric indeces x[i], e.g. class(b)[3] to obtain double".

However you could apply string matching, and write an own my_class() function which is based on a vector of valid class definitions.

valid <- c("data.frame", "double", "character")

my_class <- function(x) {k <- class(x);k[k %in% valid]}

my_class(a)
# [1] "data.frame"

my_class(b)
# [1] "double"

Data:

a <- tibble::as_tibble(data.frame())
b <- haven::labelled()

`UseMethod()` vs `inherits()` to determine an object's class in R

OK, there is some background to be covered to answer this question (in my view)...

Within R, the class of an object is explicit in situations where you have user-defined object structures or an object such as a factor vector or data frame where other attributes play an important part in the handling of the object itself—for example, level labels of a factor vector, or variable names in a data frame, are modifiable attributes that play a primary role in accessing the observations of each object.

Note, however, that elementary R objects such as vectors, matrices, and arrays, are implicitly classed, which means the class is not identified with the attributes function. Whether implicit or explicit, the class of a given object can always be retrieved using the attribute-specific function class.

When a generic function foo is applied to an object with class attribute c("first", "second"), the system searches for a function called foo.first and, if it finds it, applies it to the object. If no such function is found, a function called foo.second is tried. If no class name produces a suitable function, the function foo.default is used (if it exists). If there is no class attribute, the implicit class is tried, then the default method.

The function class prints the vector of names of classes an object inherits from.

class <- sets the classes an object inherits from.

inherits() indicates whether its first argument inherits from any of the classes specified in the what argument. Method dispatch takes place based on the class of the first argument to the generic function. If which is TRUE then an integer vector of the same length as what is returned. Each element indicates the position in the class(x) matched by the element of what; zero indicates no match. If which is FALSE then TRUE is returned by inherits if any of the names in what match with any class.

All but inherits() are primitive functions.

Considerations

OK, so let us now consider your examples in reverse order...

foo <- function (x) UseMethod('foo')

foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}

now if we use the function methods()

methods(foo)
[1] foo.list foo.numeric
see '?methods' for accessing help and source code
> getS3method('foo','list')
function (x) {
# Foo the list
}

thus we have a class foo and two associated methods foo.list and foo.numeric. Thus, we now know that class foo, has methods to support list and numeric operations.

OK, now let's consider your first example...

function (x) {
if (inherits(x, 'list')) {
# Foo the list
print(paste0("List: ", x))
} else if (inherits(x, 'numeric')) {
# Foo the numeric
print(paste0("Numeric: ", x))
} else {
# Throw an error
print(paste0("Unhandled - Sorry!"))
}
}

the problem is that this is not an s3 class, it is an R function. If you run methods() against foo it returns "no methods found"

> methods(foo)
no methods found
> getS3method('foo','list')
Error in getS3method("foo", "list") : no function 'foo' could be found

so what is happening in the second example? The inherits() operation is matching the class of the parameter. inherits() -> Method dispatch takes place based on the class of the first argument to the generic function.

So your first example is simply looking up the class of the function argument x, no S3 class is created or exists.

What are the advantages to each approach? Are there performance implications?

OK, I am biased here but an object’s class is one of the most useful attributes for describing an entity in R. Every object you create is identified, either implicitly or explicitly, with at least one class. R is an object-oriented programming language, meaning entities are stored as objects and have methods that act upon them.

So the second approach is the way to go in my opinion. Why? Because you are truly using the language construct as intended. The first approach where you use inherits() explicitly feels like a hack. Readability is key to comprehension from my personal perspective, thus I worry that a person reading the first example might be led to ask the question "Why did they (the programmer) take said approach, what am I missing?". My concern then is that complexity is to be avoided as it can impede code comprehension. Thus, keep it simple is advantageous to code comprehension.

In reference to code performance, an if-else parser is generally going to be faster than an object lookup model though a lookup model is not equivalent to a class mapping process so I feel the performance question is tricky to answer in this context. Why? The two approaches are different.

I hope the above points you in the right direction. Stay safe, good karma flying your way.

A couple of Book recommendations here:

  1. R Inferno by Patrick Burns
  2. Advanced R by Hadley Wickham
  3. R for Everyone: Advanced Analytics and Graphics

How to custom print/show variables (with custom class) in my R package

Here is a small explanation. Adding to the amazing answer posted by @nya:

First, you are dealing with S3 classes. With these classes, we can have one method manipulating the objects differently depending on the class the object belongs to.

Below is a simple class and how it operates:

  1. Class contains numbers,
  2. The class values to be printed like 1k, 2k, 100k, 1M,
  3. The values can be manipulated numerically.

-- Lets call the class my_numbers

Now we will define the class constructor:

 my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))

Note that we added the class 'numeric'. ie the class my_numbers INHERITS from numeric class

We can create an object of the said class as follows:

b <- my_numbers(c(100, 2000, 23455, 24567654, 2345323))
b
[1] 100 2000 23455 24567654 2345323
attr(,"class")
[1] "my_numbers" "numeric"

Nothing special has happened. Only an attribute of class has been added to the vector. You can easily remove/strip off the attribute by calling c(b)

c(b)
[1] 100 2000 23455 24567654 2345323

vector b is just a normal vector of numbers.

Note that the class attribute could have been added by any of the following (any many more ways):

 class(b) <- c('my_numbers', 'numeric')
attr(b, 'class') <- c('my_numbers', 'numeric')
attributes(b) <- list(class = c('my_numbers', 'numeric'))

Where is the magic?

I will write a simple function with recursion. Don't worry about the function implementation. We will just use it as an example.

my_numbers_print = function(x, ..., digs=2,  d = 1,  L =   c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}

my_numbers_print(b)
[1] "100.00" "2.00K" "23.45K" "24.57M" "2.35M"

There is no magic still. Thats the normal function called on b.

Instead of calling the function my_numbers_print we could write another function with the name print.my_numbers ie method.class_name (Note I added the parameter quote = FALSE

print.my_numbers = function(x, ..., quote = FALSE){
print(my_numbers_print(x), quote = quote)
}

b
[1] 100.00 2.00K 23.45K 24.57M 2.35M

Now b has been printed nicely. We can still do math on b

 b^2
[1] 10.00K 4.00M 550.14M 603.57T 5.50T

Can we add b to a dataframe?

data.frame(b)
b
1 100
2 2000
3 23455
4 24567654
5 2345323

b reverts back to numeric instead of maintaining its class. That is because we need to change another function. ie the formats function.

Ideally, the correct way to do this is to create a format function and then the print function. (Becoming too long)



Summary : Everything Put Together

# Create a my_numbers class definition function
my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))

# format the numbers
format.my_numbers = function(x,...,digs =1, d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}

#printing the numbers
print.my_numbers = function(x, ...) print(format(x), quote = FALSE)

# ensure class is maintained after extraction to allow for sort/order etc
'[.my_numbers' = function(x, ..., drop = FALSE) my_numbers(NextMethod('['))

b <- my_numbers(c(2000, 100, 20, 23455, 24567654, 2345323))

data.frame(x = sort(-b) / 2)

x
1 -12.3M
2 -1.2M
3 -11.7K
4 -1.0K
5 -50.0
6 -10.0

Why do I see integer rather than Vector for the class of an R vector

In R, "class" is an attribute of an object. However, in R language definition, a vector can not have other attributes than "names" (this is really why a "factor" is not a vector). The function class here is giving you the "mode" of a vector.

From ?vector:

 ‘is.vector’ returns ‘TRUE’ if ‘x’ is a vector of the specified
mode having no attributes _other than names_. It returns ‘FALSE’
otherwise.

From ?class:

 Many R objects have a ‘class’ attribute, a character vector giving
the names of the classes from which the object _inherits_. If the
object does not have a class attribute, it has an implicit class,
‘"matrix"’, ‘"array"’ or the result of ‘mode(x)’ (except that
integer vectors have implicit class ‘"integer"’).

See Here for a bit more on the "mode" of a vector, and get yourself acquainted with another amazing R object: NULL.

To understand the "factor" issue, try your second column:

c2 <- df[, 2]

attributes(c2)
#$levels
#[1] "a" "b" "c"
#
#$class
#[1] "factor"

class(c2)
#[1] "factor"

is.vector(c2)
#[1] FALSE


Related Topics



Leave a reply



Submit