What's the difference between `1L` and `1`?
So, @James and @Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as @Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max
whenever you need the maximum integer value (and negate that for the minimum).
Order of operator precedence when using : (the colon)
Because the operator :
has precedence over +
so 1+1:3
is really 1+(1:3)
(i. e. 2:4
) and not 2:3
. Thus, to change the order of execution as defined operator precedence, use parentheses ()
You can see the order of precedence of operators in the help file ?Syntax
. Here is the relevant part:
The following unary and binary operators are defined. They are listed in precedence groups, from highest to lowest.
::
:::
access variables in a namespace
$
@
component / slot extraction
[
[[
indexing
^
exponentiation (right to left)
-
+
unary minus and plus
:
sequence operator
%any%
special operators (including%%
and%/%
)
*
/
multiply, divide
+
-
(binary) add, subtract
What's the difference between a double and a numeric?
I guess this has to do with converting your data.frame
into a tibble
. Replicating your code on mtcars
dataset, we get:
mtcars %>%
as_tibble() %>%
mutate(year = as.double(seq(1956, 2009, 0.25)[1:nrow(mtcars)])) %>%
dplyr::select(year) %>%
head
# year
# <dbl>
# 1 1956
# 2 1956.
# 3 1956.
# 4 1957.
# 5 1957
# 6 1957.
Here's the difference if we comment as_tibble
:
# year
# 1 1956.00
# 2 1956.25
# 3 1956.50
# 4 1956.75
# 5 1957.00
# 6 1957.25
Swapping as.double
with as.numeric
does not change anything.
From ?double
:
as.double is a generic function. It is identical to as.numeric.
In R programming, what's the difference between & vs &&, and | vs ||
they can only handle a single logical test on each side of the operator
a <- c(T, F, F, F)
b <- c(T, F, F, F)
a && b
Returns
[1] TRUE
Because only the first element of a
and b
are tested!
Edit:
Consider the following, where we 'rotate' a
and b
after each &&
test:
a <- c(T, F, T, F)
b <- c(T, F, F, T)
for (i in seq_along(a)){
cat(paste0("'a' is: ", paste0(a, collapse=", "), " and\n'b' is: ", paste0(b, collapse=", "),"\n"))
print(paste0("'a && b' is: ", a && b))
a <- c(a[2:length(a)], a[1])
b <- c(b[2:length(b)], b[i])
}
Gives us:
'a' is: TRUE, FALSE, TRUE, FALSE and
'b' is: TRUE, FALSE, FALSE, TRUE
[1] "'a && b' is: TRUE"
'a' is: FALSE, TRUE, FALSE, TRUE and
'b' is: FALSE, FALSE, TRUE, TRUE
[1] "'a && b' is: FALSE"
'a' is: TRUE, FALSE, TRUE, FALSE and
'b' is: FALSE, TRUE, TRUE, FALSE
[1] "'a && b' is: FALSE"
'a' is: FALSE, TRUE, FALSE, TRUE and
'b' is: TRUE, TRUE, FALSE, TRUE
[1] "'a && b' is: FALSE"
Additionally, &&
, ||
stops as soon as the expression is clear:
FALSE & a_not_existing_object
TRUE | a_not_existing_object
Returns:
Error: object 'a_not_existing_object' not found
Error: object 'a_not_existing_object' not found
But:
FALSE && a_not_existing_object
TRUE || a_not_existing_object
Returns:
[1] FALSE
[1] TRUE
Because anything after FALSE
AND something (and TRUE
OR something) becomes FALSE
and TRUE
respectively
This last behavior of &&
and ||
is especially useful if you want to check in your control-flow for an element that may not exist:
if (exists(a_not_existing_object) && a_not_existing_object > 42) {...}
This way the evaluation stops after the first expression evaluates to FALSE
and the a_not_existing_object > 42
part is not even atempted!
What's the difference between as.integer() and +0L used on booleans?
x + 0L
is an element wise operation on x
; as such, it often preserves the shape of the data. as.integer
isn’t: it takes the whole structure – here, a matrix – and converts it into a one-dimensional integer vector.
That said, in the general case I’d strongly suggest using as.integer
and discourage + 0L
as a clever hack (remember: often, clever ≠ good). If you want to preserve the shape of data I suggest using David’s method from the comments, rather than the + 0L
hack:
a[] = as.integer(a)
This uses the normal meaning of as.integer
, but the result is assigned to the individual elements of a
, rather than a
itself. In other words, a
’s shape remains untouched.
Comput the difference between two values for two dates?
The solution has two parts:
- wrangle the dataset so that it is tidy for your purposes
- plot the graph
Wrangling the data is straightforward. (The code here could be shortened, but I've written it as it is to make the various steps clear.) Also note the "obvious" correction to the typo mentioned by @user2974951.
# Extract the baseline values and convert to long format
baseline <- myd %>%
filter(year == 1990) %>%
select(-year) %>%
pivot_longer(everything(), names_to="variable", values_to="baseline")
# Extract the endpoint values and convert to long format
endpoint <- myd %>%
filter(year == 1999) %>%
select(-year) %>%
pivot_longer(everything(), names_to="variable", values_to="endpoint")
# Merge by variable and calculate difference
difference <- baseline %>%
full_join(endpoint, by="variable") %>%
mutate(diff=endpoint-baseline)
At this point, difference
looks like this:
> difference
# A tibble: 2 × 4
variable baseline endpoint diff
<chr> <dbl> <dbl> <dbl>
1 ud 137. 128. -9.75
2 ax 67 68 1
Now create the bar chart.
# Create the bar chart
difference %>%
ggplot() +
geom_col(aes(x=variable, y=diff))
Note that this solution is robust with respect to the number of variables in the original dataset, and their names. It will also handle missing values without error. It could easily be generalised to calculate and plot the difference between any two years (eg earliest year as baseline and most recent as endpoint).
What is the difference between y ~ 1, y ~ 0 and y ~ -1 in R formulas?
From the ?formula
documentation:
The ‘-’ operator removes the specified terms, so that ‘(a+b+c)^2 -
a:b’ is identical to ‘a + b + c + b:c + a:c’. It can also used to
remove the intercept term: when fitting a linear model ‘y ~ x - 1’
specifies a line through the origin. A model with no intercept
can be also specified as ‘y ~ x + 0’ or ‘y ~ 0 + x’.
So:
y ~ 1
includes an intercepty ~ 0
does not include an intercepty ~ -1
does not include an intercept
The last two are functionally equivalent.
Related Topics
How to Create a Loop That Includes Both a Code Chunk and Text with Knitr in R
Longest Common Substring in R Finding Non-Contiguous Matches Between the Two Strings
Return Index from a Vector of the Value Closest to a Given Element
Extract Every Nth Element of a Vector
How to Put a Geom_Sf Produced Map on Top of a Ggmap Produced Raster
Update Subset of Data.Table Based on Join
How to Select Rows from a Dataframe That Do Not Match
Remove Duplicates Keeping Entry with Largest Absolute Value
Dynamically Creating Tabs with Plots in Shiny Without Re-Creating Existing Tabs
Transposing a Dataframe Maintaining the First Column as Heading
Subsetting a Data.Table Using !=<Some Non-Na> Excludes Na Too
How to Count the Frequency of a String for Each Row in R
Difference Between Passing Options in Aes() and Outside of It in Ggplot2
How to Get Name of Variable in R (Substitute)