R Self Reference

R self reference

Try package data.table and its := operator. It's very fast and very short.

DT[col1==something, col2:=col3+1]

The first part col1==something is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $. Then the second part col2:=col3+1 assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. := is assignment by reference. No copies of any object are taken, so is faster than <-, =, within and transform.

Also, soon to be implemented in v1.8.1, one end goal of j's syntax allowing := in j like that is combining it with by, see question: when should I use the := operator in data.table.

UDPDATE : That was indeed released (:= by group) in July 2012.

Self reference when indexing into a vector

You can use pipes which allow self-referencing with .:

library(pipeR)
my.vector.with.a.long.name %>>% `[`(.>5)
[1] 6 7 8 9 10
my.vector.with.a.long.name %>>% `[`(.%%2==0)
[1] 2 4 6 8 10

Self referencing calculation in group by in r

You can't reference the variable you're creating in mutate. Luckily, the variable being created in this case can be created with cumsum instead.

df %>% group_by(group,level) %>% mutate(v2 = cumsum(v1))

How to produce a self-referencing variable in R (e.g., index levels given returns)?

As a workaround, you can use following trick in edited circumstances. Note you may change this for any number of simultaneous series

  • I just added an extra group_by statement based on a modulo sequence of required number of variables using seq(n()) %% 2
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- ifelse(dt$time == 1, 120, ifelse(dt$time == 2, 125, as.numeric(NA)))
library(dplyr, warn.conflicts = F)

dt %>% group_by(id) %>%
group_by(d = seq(n()) %% 2, .add = TRUE) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 10 x 5
#> # Groups: id, d [4]
#> id time ret ind d
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 a 1 0.00554 120 1
#> 2 a 2 -0.00280 125 0
#> 3 a 3 0.0178 122. 1
#> 4 a 4 0.00187 125. 0
#> 5 a 5 0.0114 124. 1
#> 6 b 1 0.00416 120 0
#> 7 b 2 0.0123 125 1
#> 8 b 3 0.00237 120. 0
#> 9 b 4 -0.00365 125. 1
#> 10 b 5 0.0111 122. 0


OLD answer: Without using purrr

library(tidyverse)

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
dt
#> id time ret ind
#> 1 a 1 0.005543269 100
#> 2 a 2 -0.002802719 NA
#> 3 a 3 0.017751634 NA
#> 4 a 4 0.001873201 NA
#> 5 b 1 0.011425261 100
#> 6 b 2 0.004155261 NA
#> 7 b 3 0.012295066 NA
#> 8 b 4 0.002366797 NA

dt %>% group_by(id) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 8 x 4
#> # Groups: id [2]
#> id time ret ind
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 0.00554 100
#> 2 a 2 -0.00280 99.7
#> 3 a 3 0.0178 101.
#> 4 a 4 0.00187 102.
#> 5 b 1 0.0114 100
#> 6 b 2 0.00416 100.
#> 7 b 3 0.0123 102.
#> 8 b 4 0.00237 102.

Created on 2021-07-27 by the reprex package (v2.0.0)

Generate self reference key within the table using R mutate in a dataframe

The Person_Id fields in your examples don't match.

I'm not sure if this is what you're after, but from your dput() I have created a file that removes the last column:

df_input <- df_output %>% 
select(-Preceding_visit_id)

Then done this:

df_input %>% 
group_by(Person_Id) %>%
mutate(Preceding_visit_id = lag(Visit_Id))

And the output is this:

# A tibble: 14 x 4
# Groups: Person_Id [3]
Person_Id Visit_Id Purpose Preceding_visit_id
<dbl> <dbl> <chr> <dbl>
1 1 1 checkup NA
2 1 2 checkup 1
3 1 3 checkup 2
4 1 4 checkup 3
5 1 5 checkup 4
6 2 6 checkup NA
7 2 7 checkup 6
8 2 8 checkup 7
9 2 9 checkup 8
10 2 10 checkup 9
11 2 11 checkup 10
12 3 12 checkup NA
13 3 13 checkup 12
14 3 14 checkup 13


Related Topics



Leave a reply



Submit