Why would R use the L suffix to denote an integer?
Why is "L" used as a suffix?
I've never seen it written down, but I theorise in short for two reasons:
Because R handles complex numbers which may be specified using the
suffix"i"
and this would be too simillar to"I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648
to 2,147,483,647
. Larger values are stored as double
.
This wiki page has more information on common data types, their conventional names and ranges.
And also from ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Why do 1.0L and 1.1L return different types?
The reason that 1.0L
and 1.1L
will return different data types is because returning an integer for 1.1
will result in loss of information, whilst for 1.0
it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485
) is this code (part of the function NumericValue()
) which actually creates a int
data type from a double
input that is suffixed by an ascii "L"
:
/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}
Clarification of L in R
This answer is a summary of the comments above. It is basically just pointers to various help texts, but as evident from OP's attempt with ?L
, it is not always easy to find the relevant help text. I was expecting to find something about L
in ?as.integer
, but no. Hopefully this answer is more useful than a pile of comments.
In the R language
definition
you will find: "We can use theL
suffix to qualify any number
with the intent of making it an explicit integer"From
?NumericConstants
: "[...] All other numeric constants start
with a digit or period and are either a decimal or hexadecimal
constant optionally followed byL
""An numeric constant immediately followed by
L
is regarded as an
integer number when possible (and with a warning if it contains a
".").""You can combine the "
0x
" prefix with the "L
" suffix".You may also find it useful to check the examples on floating point
vs. integers in the section "Two Kinds Revisited"
here.
"Put capitalL
(as in “long”) after a number to make R create it as
an integer".Not specifically about
L
, but always relevant in the floating point
vs. integers context is FAQ7.31: "Why doesn’t R think these numbers are equal?".
Threads with discussions about the efficiency of L
:
Threads on R-help where others have struggled to find documentation about L
, with a possible explanation of why the letter L
, and why L
vs as.integer
in terms of efficiency.
- Difference between 10 and 10L
First William Dunlap:
Why not
10I
for integer? Perhaps because "I
" and "l
" look too similar, perhaps because "i
" and "I
" sound too similar. The "L
" does not mean "long": integers are 4 bytes long.
Then Brian Ripley:
Actually it does: this notation dates from the C language on 16-bit
computers where integers were 16-bits and longs were 32-bit (and R has
no 'long' type).
The author of this in R never explained why he chose the notation, but
it is shorter thanas.integer(10)
, and more efficient as the coercion is
done at parse time.
The L Word
Discussion about the efficiency in different situations, with some benchmarkings.R history: Why 'L; in suffix character ‘L’ for integer constants?
More discussions here.
What does the L mean at the end of an integer literal?
It is a long
integer literal.
Integer literals have a type of int
by default; the L
suffix gives it a type of long
(Note that if the value cannot be represented by an int
, then the literal will have a type of long
even without the suffix).
What's the difference between `1L` and `1`?
So, @James and @Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as @Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max
whenever you need the maximum integer value (and negate that for the minimum).
Benefits of using integer values for constants rather than numeric values (e.g. 1L vs 1) in R
These are some of the use cases in which I explicitly use the L
suffix in declaring the constants. Of course these are not strictly "canonical" (or the only ones), but maybe you can have an idea of the rationale behind. I added, for each case, a "necessary" flag; you will see that these arise only if you interface other languages (like C).
- Logical type conversion (not necessary)
Instead of using a classic as.integer
, I use adding 0L
to a logical vector to make it integer. Of course you could just use 0
, but this would require more memory (typically 8 bytes instead of four) and a conversion.
- Manipulating the result of a function that returns integer (not necessary)
Say for instance that you want to find to retrieve the elements of the vector after a NA
. You could:
which(is.na(vec)) + 1L
Since which
returns an integer
, adding 1L
will preserve the type and avoid an implicit conversion. Nothing will happen if you omit the L
, since it's just a small optimization. This happens also with match
for instance: if you want to post-process the result of such a function, it's good habit to preserve the type if possible.
- Interfacing C (necessary)
From ?integer
:
Integer vectors exist so that data can be passed to C or Fortran
code which expects them, and so that (small) integer data can be
represented exactly and compactly.
C is much stricter regarding data types. This implies that, if you pass a vector to a C function, you can not rely on C to do the conversions. Say that you want to replace the elements after a NA with some value, say 42. You find the positions of the NA values at the R level (as we did before with which
) and then pass the original vector and the vector of indices to C. The C function will look like:
SEXP replaceAfterNA (SEXP X, SEXP IND) {
...
int *ind = INTEGER(IND);
...
for (i=0; i<l; i++) {
//make here the replacement
}
}
and the from the R side:
...
ind <- which(is.na(x)) + 1L
.Call("replaceAfterNA", x, ind)
...
If you omit the L
in the first line of above, you will receive an error like:
INTEGER() cannot be applied to double vectors
since C is expecting an integer type.
- Interfacing Java (necessary)
Same as before. If you use the rJava
package and want R to call your own custom Java classes and methods, you have to be sure that an integer is passed when the Java method requires an integer. Not adding a specific example here, but it should be clear why you may want to use the L
suffix in constants in these cases.
Addendum
The previous cases where about when you may want to use L
. Even if I guess much less common, it might be useful to add a case in which you don't want the L
. This may arise if there is danger of integer overflow. The *
, +
and -
operators preserve the type if both the operand are integer. For example:
#this overflows
31381938L*3231L
#[1] NA
#Warning message:
#In 31381938L * 3231L : NAs produced by integer overflow
#this not
31381938L*3231
#[1] 1.01395e+11
So, if you are doing operations on an integer variable which might produce overflow, it's important to cast it to double
to avoid any risk. Adding/subtracting to that variable a constant without the L
might be a good occasion as any to make the cast.
R : Int vs Num Anomaly in Vector
There are two distinct issue at play:
In
c(2, 1, 1, 5)
you are explicitly creatingnumeric
types. Forinteger
, you would have to usec(2L, 1L, 1L, 5L)
as only the suffixL
ensures creation of aninteger
type (or casting viaas.integer()
etc). But read on ...In
c(1:5)
a historical override for the:
comes into play. Because the usage almost always involves integer sequences, this is what you get: integers.
Both forms are documented, so it is not an anomaly as your question title implies.
When should integers be explicitly specified?
Using 1L
etc is programmatically safe, as in it is explicit as to what is meant, and does not rely on any conversions etc.
When writing code interactively, it can be easy to notice errors and fix along the way, however if you are writing a package (even base R
), it will be safer to be explicit.
When you are considering equality, using floating point numbers will cause precision issues See this FAQ.
Explicitly specifying integers avoids this, as nrow
and length
, and the index arguments to apply
return or require integers.
Java's L number (long) specification
There are specific suffixes for long
(e.g. 39832L
), float
(e.g. 2.4f
) and double
(e.g. -7.832d
).
If there is no suffix, and it is an integral type (e.g. 5623
), it is assumed to be an int
. If it is not an integral type (e.g. 3.14159
), it is assumed to be a double
.
In all other cases (byte
, short
, char
), you need the cast as there is no specific suffix.
The Java spec allows both upper and lower case suffixes, but the upper case version for long
s is preferred, as the upper case L
is less easy to confuse with a numeral 1
than the lower case l
.
See the JLS section 3.10 for the gory details (see the definition of IntegerTypeSuffix
).
Related Topics
Calculate Cumulative Average (Mean)
Last Observation Carried Forward in a Data Frame
Create Categorical Variable in R Based on Range
Anova Test Fails on Lme Fits Created with Pasted Formula
How to Get the Maximum Value by Group
Specification of First and Last Tick Marks with Scale_X_Date
Performing Dplyr Mutate on Subset of Columns
Calculate Row-Wise Proportions
How to Use a String Variable to Select a Data Frame Column Using $ Notation
Data.Table with Two String Columns of Set Elements, Extract Unique Rows with Each Row Unsorted
Using Rcpp Within Parallel Code via Snow to Make a Cluster
Subtract a Column in a Dataframe from Many Columns in R
Tidyverse Pivot_Longer Several Sets of Columns, But Avoid Intermediate Mutate_Wider Steps
Why Does R Use Partial Matching
R Knitr Chunk Options for Figure Height/Width Are Not Working