Why Would R Use the "L" Suffix to Denote an Integer

Why would R use the L suffix to denote an integer?

Why is "L" used as a suffix?

I've never seen it written down, but I theorise in short for two reasons:

  1. Because R handles complex numbers which may be specified using the
    suffix "i" and this would be too simillar to "I"

  2. Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.

The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.

This wiki page has more information on common data types, their conventional names and ranges.

And also from ?integer

Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.



Why do 1.0L and 1.1L return different types?

The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":

/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}

Clarification of L in R

This answer is a summary of the comments above. It is basically just pointers to various help texts, but as evident from OP's attempt with ?L, it is not always easy to find the relevant help text. I was expecting to find something about L in ?as.integer, but no. Hopefully this answer is more useful than a pile of comments.

  • In the R language
    definition

    you will find: "We can use the L suffix to qualify any number
    with the intent of making it an explicit integer"

  • From ?NumericConstants: "[...] All other numeric constants start
    with a digit or period and are either a decimal or hexadecimal
    constant optionally followed by L"

    "An numeric constant immediately followed by L is regarded as an
    integer number when possible (and with a warning if it contains a
    ".")."

    "You can combine the "0x" prefix with the "L" suffix".

  • You may also find it useful to check the examples on floating point
    vs. integers in the section "Two Kinds Revisited"
    here.
    "Put capital L (as in “long”) after a number to make R create it as
    an integer".

  • Not specifically about L, but always relevant in the floating point
    vs. integers context is FAQ7.31: "Why doesn’t R think these numbers are equal?".


Threads with discussions about the efficiency of L:

Threads on R-help where others have struggled to find documentation about L, with a possible explanation of why the letter L, and why L vs as.integer in terms of efficiency.

  1. Difference between 10 and 10L

First William Dunlap:

Why not 10I for integer? Perhaps because "I" and "l" look too similar, perhaps because "i" and "I" sound too similar. The "L" does not mean "long": integers are 4 bytes long.

Then Brian Ripley:

Actually it does: this notation dates from the C language on 16-bit
computers where integers were 16-bits and longs were 32-bit (and R has
no 'long' type).

The author of this in R never explained why he chose the notation, but
it is shorter than as.integer(10), and more efficient as the coercion is
done at parse time
.


  1. The L Word

    Discussion about the efficiency in different situations, with some benchmarkings.

  2. R history: Why 'L; in suffix character ‘L’ for integer constants?

  3. More discussions here.

What does the L mean at the end of an integer literal?

It is a long integer literal.

Integer literals have a type of int by default; the L suffix gives it a type of long (Note that if the value cannot be represented by an int, then the literal will have a type of long even without the suffix).

What's the difference between `1L` and `1`?

So, @James and @Brian explained what 3L means. But why would you use it?

Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).

Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:

x <- 1:100
typeof(x) # integer

y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)

z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)

...but also note that working excessively with integers can be dangerous:

1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!

...and as @Gavin pointed out, the range for integers is roughly -2e9 to 2e9.

A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max whenever you need the maximum integer value (and negate that for the minimum).

Benefits of using integer values for constants rather than numeric values (e.g. 1L vs 1) in R

These are some of the use cases in which I explicitly use the L suffix in declaring the constants. Of course these are not strictly "canonical" (or the only ones), but maybe you can have an idea of the rationale behind. I added, for each case, a "necessary" flag; you will see that these arise only if you interface other languages (like C).

  • Logical type conversion (not necessary)

Instead of using a classic as.integer, I use adding 0L to a logical vector to make it integer. Of course you could just use 0, but this would require more memory (typically 8 bytes instead of four) and a conversion.

  • Manipulating the result of a function that returns integer (not necessary)

Say for instance that you want to find to retrieve the elements of the vector after a NA. You could:

which(is.na(vec)) + 1L

Since which returns an integer, adding 1L will preserve the type and avoid an implicit conversion. Nothing will happen if you omit the L, since it's just a small optimization. This happens also with match for instance: if you want to post-process the result of such a function, it's good habit to preserve the type if possible.

  • Interfacing C (necessary)

From ?integer:

Integer vectors exist so that data can be passed to C or Fortran
code which expects them, and so that (small) integer data can be
represented exactly and compactly.

C is much stricter regarding data types. This implies that, if you pass a vector to a C function, you can not rely on C to do the conversions. Say that you want to replace the elements after a NA with some value, say 42. You find the positions of the NA values at the R level (as we did before with which) and then pass the original vector and the vector of indices to C. The C function will look like:

SEXP replaceAfterNA (SEXP X, SEXP IND) {
...
int *ind = INTEGER(IND);
...
for (i=0; i<l; i++) {
//make here the replacement
}
}

and the from the R side:

...
ind <- which(is.na(x)) + 1L
.Call("replaceAfterNA", x, ind)
...

If you omit the L in the first line of above, you will receive an error like:

INTEGER() cannot be applied to double vectors

since C is expecting an integer type.

  • Interfacing Java (necessary)

Same as before. If you use the rJava package and want R to call your own custom Java classes and methods, you have to be sure that an integer is passed when the Java method requires an integer. Not adding a specific example here, but it should be clear why you may want to use the L suffix in constants in these cases.

Addendum

The previous cases where about when you may want to use L. Even if I guess much less common, it might be useful to add a case in which you don't want the L. This may arise if there is danger of integer overflow. The *, + and - operators preserve the type if both the operand are integer. For example:

#this overflows
31381938L*3231L
#[1] NA
#Warning message:
#In 31381938L * 3231L : NAs produced by integer overflow

#this not
31381938L*3231
#[1] 1.01395e+11

So, if you are doing operations on an integer variable which might produce overflow, it's important to cast it to double to avoid any risk. Adding/subtracting to that variable a constant without the L might be a good occasion as any to make the cast.

R : Int vs Num Anomaly in Vector

There are two distinct issue at play:

  1. In c(2, 1, 1, 5) you are explicitly creating numeric types. For integer, you would have to use c(2L, 1L, 1L, 5L) as only the suffix L ensures creation of an integer type (or casting via as.integer() etc). But read on ...

  2. In c(1:5) a historical override for the : comes into play. Because the usage almost always involves integer sequences, this is what you get: integers.

Both forms are documented, so it is not an anomaly as your question title implies.

When should integers be explicitly specified?

Using 1L etc is programmatically safe, as in it is explicit as to what is meant, and does not rely on any conversions etc.

When writing code interactively, it can be easy to notice errors and fix along the way, however if you are writing a package (even base R), it will be safer to be explicit.

When you are considering equality, using floating point numbers will cause precision issues See this FAQ.

Explicitly specifying integers avoids this, as nrow and length, and the index arguments to apply return or require integers.

Java's L number (long) specification

There are specific suffixes for long (e.g. 39832L), float (e.g. 2.4f) and double (e.g. -7.832d).

If there is no suffix, and it is an integral type (e.g. 5623), it is assumed to be an int. If it is not an integral type (e.g. 3.14159), it is assumed to be a double.

In all other cases (byte, short, char), you need the cast as there is no specific suffix.

The Java spec allows both upper and lower case suffixes, but the upper case version for longs is preferred, as the upper case L is less easy to confuse with a numeral 1 than the lower case l.

See the JLS section 3.10 for the gory details (see the definition of IntegerTypeSuffix).



Related Topics



Leave a reply



Submit