Types and classes of variables
In R every "object" has a mode
and a class
. The former represents how an object is stored in memory (numeric, character, list and function) while the later represents its abstract type. For example:
d <- data.frame(V1=c(1,2))
class(d)
# [1] "data.frame"
mode(d)
# [1] "list"
typeof(d)
# list
As you can see data frames are stored in memory as list
but they are wrapped into data.frame
objects. The latter allows for usage of member functions as well as overloading functions such as print
with a custom behavior.
typeof
(storage.mode
) will usually give the same information as mode
but not always. Case in point:
typeof(c(1,2))
# [1] "double"
mode(c(1,2))
# [1] "numeric"
The reasoning behind this can be found here:
The R specific function typeof returns the type of an R object
Function mode gives information about the mode of an object in the sense of Becker, Chambers & Wilks (1988), and is more compatible with other implementations of the S language
The link that I posted above also contains a list of all native R basic types
(vectors, lists etc.) and all compound objects
(factors and data.frames) as well as some examples of how mode
, typeof
and class
are related for each type.
How to extract just one of classes of object with multiple classes
class()
results in an unnamed character vector, which you usually subset using numeric indeces x[i]
, e.g. class(b)[3]
to obtain double"
.
However you could apply string matching, and write an own my_class()
function which is based on a vector of valid class definitions.
valid <- c("data.frame", "double", "character")
my_class <- function(x) {k <- class(x);k[k %in% valid]}
my_class(a)
# [1] "data.frame"
my_class(b)
# [1] "double"
Data:
a <- tibble::as_tibble(data.frame())
b <- haven::labelled()
`UseMethod()` vs `inherits()` to determine an object's class in R
OK, there is some background to be covered to answer this question (in my view)...
Within R, the class of an object is explicit in situations where you have user-defined object structures or an object such as a factor vector or data frame where other attributes play an important part in the handling of the object itself—for example, level labels of a factor vector, or variable names in a data frame, are modifiable attributes that play a primary role in accessing the observations of each object.
Note, however, that elementary R objects such as vectors, matrices, and arrays, are implicitly classed, which means the class is not identified with the attributes function. Whether implicit or explicit, the class of a given object can always be retrieved using the attribute-specific function class.
When a generic function foo
is applied to an object
with class attribute c("first", "second"), the system searches for a function called foo.first and, if it finds it, applies it to the object. If no such function is found, a function called foo.second
is tried. If no class name produces a suitable function, the function foo.default
is used (if it exists). If there is no class attribute, the implicit class is tried, then the default
method.
The function class prints the vector of names of classes an object inherits from.
class
<- sets the classes an object inherits from.
inherits() indicates whether its first argument inherits from any of the classes specified in the what argument. Method dispatch takes place based on the class of the first argument to the generic function. If which is TRUE then an integer vector of the same length as what is returned. Each element indicates the position in the class(x) matched by the element of what; zero indicates no match. If which is FALSE then TRUE is returned by inherits if any of the names in what match with any class.
All but inherits() are primitive functions.
Considerations
OK, so let us now consider your examples in reverse order...
foo <- function (x) UseMethod('foo')
foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}
now if we use the function methods()
methods(foo)
[1] foo.list foo.numeric
see '?methods' for accessing help and source code
> getS3method('foo','list')
function (x) {
# Foo the list
}
thus we have a class foo
and two associated methods foo.list
and foo.numeric
. Thus, we now know that class foo
, has methods to support list
and numeric
operations.
OK, now let's consider your first example...
function (x) {
if (inherits(x, 'list')) {
# Foo the list
print(paste0("List: ", x))
} else if (inherits(x, 'numeric')) {
# Foo the numeric
print(paste0("Numeric: ", x))
} else {
# Throw an error
print(paste0("Unhandled - Sorry!"))
}
}
the problem is that this is not an s3 class, it is an R function. If you run methods()
against foo
it returns "no methods found"
> methods(foo)
no methods found
> getS3method('foo','list')
Error in getS3method("foo", "list") : no function 'foo' could be found
so what is happening in the second example? The inherits() operation is matching the class of the parameter. inherits() -> Method dispatch takes place based on the class of the first argument to the generic function.
So your first example is simply looking up the class of the function argument x, no S3 class is created or exists.
What are the advantages to each approach? Are there performance implications?
OK, I am biased here but an object’s class is one of the most useful attributes for describing an entity in R. Every object you create is identified, either implicitly or explicitly, with at least one class. R is an object-oriented programming language, meaning entities are stored as objects and have methods that act upon them.
So the second approach is the way to go in my opinion. Why? Because you are truly using the language construct as intended. The first approach where you use inherits() explicitly feels like a hack. Readability is key to comprehension from my personal perspective, thus I worry that a person reading the first example might be led to ask the question "Why did they (the programmer) take said approach, what am I missing?". My concern then is that complexity is to be avoided as it can impede code comprehension. Thus, keep it simple is advantageous to code comprehension.
In reference to code performance, an if-else parser is generally going to be faster than an object lookup model though a lookup model is not equivalent to a class mapping process so I feel the performance question is tricky to answer in this context. Why? The two approaches are different.
I hope the above points you in the right direction. Stay safe, good karma flying your way.
A couple of Book recommendations here:
- R Inferno by Patrick Burns
- Advanced R by Hadley Wickham
- R for Everyone: Advanced Analytics and Graphics
How to custom print/show variables (with custom class) in my R package
Here is a small explanation. Adding to the amazing answer posted by @nya:
First, you are dealing with S3 classes. With these classes, we can have one method manipulating the objects differently depending on the class the object belongs to.
Below is a simple class and how it operates:
- Class contains numbers,
- The class values to be printed like 1k, 2k, 100k, 1M,
- The values can be manipulated numerically.
-- Lets call the class my_numbers
Now we will define the class constructor:
my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))
Note that we added the class 'numeric'. ie the class my_numbers
INHERITS from numeric class
We can create an object of the said class as follows:
b <- my_numbers(c(100, 2000, 23455, 24567654, 2345323))
b
[1] 100 2000 23455 24567654 2345323
attr(,"class")
[1] "my_numbers" "numeric"
Nothing special has happened. Only an attribute of class has been added to the vector. You can easily remove/strip off the attribute by calling c(b)
c(b)
[1] 100 2000 23455 24567654 2345323
vector b
is just a normal vector of numbers.
Note that the class
attribute could have been added by any of the following (any many more ways):
class(b) <- c('my_numbers', 'numeric')
attr(b, 'class') <- c('my_numbers', 'numeric')
attributes(b) <- list(class = c('my_numbers', 'numeric'))
Where is the magic?
I will write a simple function with recursion. Don't worry about the function implementation. We will just use it as an example.
my_numbers_print = function(x, ..., digs=2, d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}
my_numbers_print(b)
[1] "100.00" "2.00K" "23.45K" "24.57M" "2.35M"
There is no magic still. Thats the normal function called on b
.
Instead of calling the function my_numbers_print
we could write another function with the name print.my_numbers
ie method.class_name
(Note I added the parameter quote = FALSE
print.my_numbers = function(x, ..., quote = FALSE){
print(my_numbers_print(x), quote = quote)
}
b
[1] 100.00 2.00K 23.45K 24.57M 2.35M
Now b has been printed nicely. We can still do math on b
b^2
[1] 10.00K 4.00M 550.14M 603.57T 5.50T
Can we add b to a dataframe?
data.frame(b)
b
1 100
2 2000
3 23455
4 24567654
5 2345323
b
reverts back to numeric instead of maintaining its class. That is because we need to change another function. ie the formats
function.
Ideally, the correct way to do this is to create a format function and then the print function. (Becoming too long)
Summary : Everything Put Together
# Create a my_numbers class definition function
my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))
# format the numbers
format.my_numbers = function(x,...,digs =1, d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}
#printing the numbers
print.my_numbers = function(x, ...) print(format(x), quote = FALSE)
# ensure class is maintained after extraction to allow for sort/order etc
'[.my_numbers' = function(x, ..., drop = FALSE) my_numbers(NextMethod('['))
b <- my_numbers(c(2000, 100, 20, 23455, 24567654, 2345323))
data.frame(x = sort(-b) / 2)
x
1 -12.3M
2 -1.2M
3 -11.7K
4 -1.0K
5 -50.0
6 -10.0
Why do I see integer rather than Vector for the class of an R vector
In R, "class" is an attribute of an object. However, in R language definition, a vector can not have other attributes than "names" (this is really why a "factor" is not a vector). The function class
here is giving you the "mode" of a vector.
From ?vector
:
‘is.vector’ returns ‘TRUE’ if ‘x’ is a vector of the specified
mode having no attributes _other than names_. It returns ‘FALSE’
otherwise.
From ?class
:
Many R objects have a ‘class’ attribute, a character vector giving
the names of the classes from which the object _inherits_. If the
object does not have a class attribute, it has an implicit class,
‘"matrix"’, ‘"array"’ or the result of ‘mode(x)’ (except that
integer vectors have implicit class ‘"integer"’).
See Here for a bit more on the "mode" of a vector, and get yourself acquainted with another amazing R object: NULL
.
To understand the "factor" issue, try your second column:
c2 <- df[, 2]
attributes(c2)
#$levels
#[1] "a" "b" "c"
#
#$class
#[1] "factor"
class(c2)
#[1] "factor"
is.vector(c2)
#[1] FALSE
Related Topics
Expression and New Line in Plot Labels
How to Use Loess Method in Ggally::Ggpairs Using Wrap Function
Grouping Every N Minutes with Dplyr
Add Regression Plane to 3D Scatter Plot in Plotly
Changing the Symbol in the Legend Key in Ggplot2
How to One-Hot-Encode Factor Variables with Data.Table
Format Text Inside R Code Chunk
Add an Image to a Table-Like Output in R
Increase Space Between Bars in Ggplot
Change the Number of Breaks Using Facet_Grid in Ggplot2
Represent Numeric Value with Typical Dollar Amount Format
How to Have Na's Displayed First Using Arrange()
Canonical Tidyverse Method to Update Some Values of a Vector from a Look-Up Table
How to View an HTML Table in the Viewer Pane
Remove Unused Factor Levels from a Ggplot Bar Plot
Dplyr::Do() Requires Named Function
How to Add Only Missing Dates in Dataframe
Regression (Logistic) in R: Finding X Value (Predictor) for a Particular Y Value (Outcome)