R self reference
Try package data.table and its :=
operator. It's very fast and very short.
DT[col1==something, col2:=col3+1]
The first part col1==something
is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $
. Then the second part col2:=col3+1
assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. :=
is assignment by reference. No copies of any object are taken, so is faster than <-
, =
, within
and transform
.
Also, soon to be implemented in v1.8.1, one end goal of j
's syntax allowing :=
in j
like that is combining it with by
, see question: when should I use the :=
operator in data.table.
UDPDATE : That was indeed released (:=
by group) in July 2012.
Self reference when indexing into a vector
You can use pipes which allow self-referencing with .
:
library(pipeR)
my.vector.with.a.long.name %>>% `[`(.>5)
[1] 6 7 8 9 10
my.vector.with.a.long.name %>>% `[`(.%%2==0)
[1] 2 4 6 8 10
Self referencing calculation in group by in r
You can't reference the variable you're creating in mutate
. Luckily, the variable being created in this case can be created with cumsum
instead.
df %>% group_by(group,level) %>% mutate(v2 = cumsum(v1))
How to produce a self-referencing variable in R (e.g., index levels given returns)?
As a workaround, you can use following trick in edited circumstances. Note you may change this for any number of simultaneous series
- I just added an extra group_by statement based on a modulo sequence of required number of variables using
seq(n()) %% 2
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- ifelse(dt$time == 1, 120, ifelse(dt$time == 2, 125, as.numeric(NA)))
library(dplyr, warn.conflicts = F)
dt %>% group_by(id) %>%
group_by(d = seq(n()) %% 2, .add = TRUE) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 10 x 5
#> # Groups: id, d [4]
#> id time ret ind d
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 a 1 0.00554 120 1
#> 2 a 2 -0.00280 125 0
#> 3 a 3 0.0178 122. 1
#> 4 a 4 0.00187 125. 0
#> 5 a 5 0.0114 124. 1
#> 6 b 1 0.00416 120 0
#> 7 b 2 0.0123 125 1
#> 8 b 3 0.00237 120. 0
#> 9 b 4 -0.00365 125. 1
#> 10 b 5 0.0111 122. 0
OLD answer: Without using purrr
library(tidyverse)
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
dt
#> id time ret ind
#> 1 a 1 0.005543269 100
#> 2 a 2 -0.002802719 NA
#> 3 a 3 0.017751634 NA
#> 4 a 4 0.001873201 NA
#> 5 b 1 0.011425261 100
#> 6 b 2 0.004155261 NA
#> 7 b 3 0.012295066 NA
#> 8 b 4 0.002366797 NA
dt %>% group_by(id) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 8 x 4
#> # Groups: id [2]
#> id time ret ind
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 0.00554 100
#> 2 a 2 -0.00280 99.7
#> 3 a 3 0.0178 101.
#> 4 a 4 0.00187 102.
#> 5 b 1 0.0114 100
#> 6 b 2 0.00416 100.
#> 7 b 3 0.0123 102.
#> 8 b 4 0.00237 102.
Created on 2021-07-27 by the reprex package (v2.0.0)
Generate self reference key within the table using R mutate in a dataframe
The Person_Id
fields in your examples don't match.
I'm not sure if this is what you're after, but from your dput()
I have created a file that removes the last column:
df_input <- df_output %>%
select(-Preceding_visit_id)
Then done this:
df_input %>%
group_by(Person_Id) %>%
mutate(Preceding_visit_id = lag(Visit_Id))
And the output is this:
# A tibble: 14 x 4
# Groups: Person_Id [3]
Person_Id Visit_Id Purpose Preceding_visit_id
<dbl> <dbl> <chr> <dbl>
1 1 1 checkup NA
2 1 2 checkup 1
3 1 3 checkup 2
4 1 4 checkup 3
5 1 5 checkup 4
6 2 6 checkup NA
7 2 7 checkup 6
8 2 8 checkup 7
9 2 9 checkup 8
10 2 10 checkup 9
11 2 11 checkup 10
12 3 12 checkup NA
13 3 13 checkup 12
14 3 14 checkup 13
Related Topics
Create Columns from Factors and Count
Data.Table and Parallel Computing
Split Character Data into Numbers and Letters
Find the N Most Common Values in a Vector
Re-Ordering Bars in R's Barplot()
Command Lines Error in Rstudio Console
Reproducing Lattice Dendrogram Graph with Ggplot2
Using Grep to Help Subset a Data Frame
Rstudio Not Picking the Encoding I'm Telling It to Use When Reading a File
Saving a Graph with Ggsave After Using Ggplot_Build and Ggplot_Gtable
The Condition Has Length > 1 and Only the First Element Will Be Used in If Else Statement
How to Get Coefficients and Their Confidence Intervals in Mixed Effects Models
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Create End of the Month Date from a Date Variable
Subsetting a Data Frame Based on Contents of Another Data Frame