What's the Difference Between Integer Class and Numeric Class in R

What's the difference between integer class and numeric class in R

There are multiple classes that are grouped together as "numeric" classes, the 2 most common of which are double (for double precision floating point numbers) and integer. R will automatically convert between the numeric classes when needed, so for the most part it does not matter to the casual user whether the number 3 is currently stored as an integer or as a double. Most math is done using double precision, so that is often the default storage.

Sometimes you may want to specifically store a vector as integers if you know that they will never be converted to doubles (used as ID values or indexing) since integers require less storage space. But if they are going to be used in any math that will convert them to double, then it will probably be quickest to just store them as doubles to begin with.

Integer vs Numeric Datatype in R

this has been discussed, see http://r.789695.n4.nabble.com/Integer-vs-numeric-td847329.html
From help(":")


Value:

For numeric arguments, a numeric vector. This will be of type
'integer' if 'from' is integer-valued and the result is
representable in the R integer type, otherwise of type '"double"'
(aka 'mode' '"numeric"').

Benefits of using integer values for constants rather than numeric values (e.g. 1L vs 1) in R

These are some of the use cases in which I explicitly use the L suffix in declaring the constants. Of course these are not strictly "canonical" (or the only ones), but maybe you can have an idea of the rationale behind. I added, for each case, a "necessary" flag; you will see that these arise only if you interface other languages (like C).

  • Logical type conversion (not necessary)

Instead of using a classic as.integer, I use adding 0L to a logical vector to make it integer. Of course you could just use 0, but this would require more memory (typically 8 bytes instead of four) and a conversion.

  • Manipulating the result of a function that returns integer (not necessary)

Say for instance that you want to find to retrieve the elements of the vector after a NA. You could:

which(is.na(vec)) + 1L

Since which returns an integer, adding 1L will preserve the type and avoid an implicit conversion. Nothing will happen if you omit the L, since it's just a small optimization. This happens also with match for instance: if you want to post-process the result of such a function, it's good habit to preserve the type if possible.

  • Interfacing C (necessary)

From ?integer:

Integer vectors exist so that data can be passed to C or Fortran
code which expects them, and so that (small) integer data can be
represented exactly and compactly.

C is much stricter regarding data types. This implies that, if you pass a vector to a C function, you can not rely on C to do the conversions. Say that you want to replace the elements after a NA with some value, say 42. You find the positions of the NA values at the R level (as we did before with which) and then pass the original vector and the vector of indices to C. The C function will look like:

SEXP replaceAfterNA (SEXP X, SEXP IND) {
...
int *ind = INTEGER(IND);
...
for (i=0; i<l; i++) {
//make here the replacement
}
}

and the from the R side:

...
ind <- which(is.na(x)) + 1L
.Call("replaceAfterNA", x, ind)
...

If you omit the L in the first line of above, you will receive an error like:

INTEGER() cannot be applied to double vectors

since C is expecting an integer type.

  • Interfacing Java (necessary)

Same as before. If you use the rJava package and want R to call your own custom Java classes and methods, you have to be sure that an integer is passed when the Java method requires an integer. Not adding a specific example here, but it should be clear why you may want to use the L suffix in constants in these cases.

Addendum

The previous cases where about when you may want to use L. Even if I guess much less common, it might be useful to add a case in which you don't want the L. This may arise if there is danger of integer overflow. The *, + and - operators preserve the type if both the operand are integer. For example:

#this overflows
31381938L*3231L
#[1] NA
#Warning message:
#In 31381938L * 3231L : NAs produced by integer overflow

#this not
31381938L*3231
#[1] 1.01395e+11

So, if you are doing operations on an integer variable which might produce overflow, it's important to cast it to double to avoid any risk. Adding/subtracting to that variable a constant without the L might be a good occasion as any to make the cast.

How can an object have two different classes in R?

When you type a 0 R understands this as a numeric (double). An integer has to be typed as 0L. And of course, : is documented to return integer values "if from is integer-valued and the result is representable in the R integer type".

R: integer versus numeric

Division using the / operator will always return a "numeric", i.e. the equivalent of a C "double". The numerators and denominators are first coerced to numeric and then the division is done. If you want to use integer division you can use %/%. If you want to create an integer then you can use trunc or floor or you can use round(x , 0) or you can use as.integer. The first second and fourth of those options are equivalent. The round function will still return "numeric" even though the printed representation appears integer. I do not think you need to worry as long as you will be happy with "double"/"numeric" results. Heck, we even allow division by 0.

Your 'aa' variable was classed as "numeric" despite being entered as a bunch of integers but had you used:

aa <- 1:8  # sequences are integer class.

It sounds as though you will not be too surprised by FAQ 7.31

What's the difference between a double and a numeric?

I guess this has to do with converting your data.frame into a tibble. Replicating your code on mtcars dataset, we get:

mtcars %>%
as_tibble() %>%
mutate(year = as.double(seq(1956, 2009, 0.25)[1:nrow(mtcars)])) %>%
dplyr::select(year) %>%
head

# year
# <dbl>
# 1 1956
# 2 1956.
# 3 1956.
# 4 1957.
# 5 1957
# 6 1957.

Here's the difference if we comment as_tibble:

# year
# 1 1956.00
# 2 1956.25
# 3 1956.50
# 4 1956.75
# 5 1957.00
# 6 1957.25

Swapping as.double with as.numeric does not change anything.
From ?double:

as.double is a generic function. It is identical to as.numeric. 

What is difference between Numeric and Integer types, and how I can use timestamp level type in Mondrian?

Well, Integer type can contain only whole numbers, like 5 or 123.
Numeric type can contain decimal numbers like 15.39.

About the second question: I think you're asking about Time dimensions, not the Timestamp (which is type). In this case it's better to refer the Mondrian documentation:

http://mondrian.pentaho.com/documentation/schema.php#Time_dimensions



Related Topics



Leave a reply



Submit