How to Subset Column Variables in Df1 Based on the Important Variables I Got in Df2

How do I subset column variables in DF1 based on the important variables I got in DF2?

I can't find a dupe so here goes- simply subset by the values of as.character(df1$ID) as in

df2[as.character(df1$ID)] ## Or just `df2[df1$ID]` if its already a character
# x1 x2 x5
# 1 1 11 41
# 2 2 12 42
# 3 3 13 43
# 4 4 14 44
# 5 5 15 45

The reason for as.character is in order to avoid sub-setting by df1$ID underlying storage mode (integer) rather by it's levels


Though this question is tagged with data.table, so we could also do this by reference (if we have a data.table)- no need to convert to character

setDT(df2)[, setdiff(names(df2), df1$ID) := NULL]
df2
# x1 x2 x5
# 1: 1 11 41
# 2: 2 12 42
# 3: 3 13 43
# 4: 4 14 44
# 5: 5 15 45

How to create a dummy in dataframe according to value in another dataframe with a different length of observations in R?

Does this work:

library(dplyr)
df2 %>% rename('df2_year' = year) %>% left_join(df1, by = 'id') %>% group_by(id) %>% mutate(dummy = if_else(year >= df2_year, 1, 0)) %>% select(-df2_year)
# A tibble: 6 x 4
# Groups: id [2]
id year x1 dummy
<int> <int> <dbl> <dbl>
1 1 2017 0.3 0
2 1 2018 0.5 0
3 1 2019 0.45 1
4 1 2020 0.5 1
5 1 2021 0.6 1
6 2 NA NA NA

Data used:

df1
id year x1
1 1 2017 0.30
2 1 2018 0.50
3 1 2019 0.45
4 1 2020 0.50
5 1 2021 0.60
df2
id year
1 1 2019
2 2 2020
  • id = 2 is missing in df1 in your sample data.

Creating new variable in dataframe based on matching values from other dataframe

I think this does what you want:

df1$z <- df2$b[match(df1$x,df2$a)]
df1$z[df1$x=='G']=NA

Output:

> df1
x z
1 A 1
2 <NA> NA
3 L NA
4 G 7
5 C 3
6 F 6
7 <NA> NA
8 J 10
9 G 7
10 K NA

Hope this helps!

Replacing column names with another data frame if matches

If you are open to a tidyverse solution, you could use

library(dplyr)
library(tibble)

df %>%
rename_with(~deframe(df2)[.x], .cols = df2$Name) %>%
select(Name, Reference, any_of(df2$Adjusted_Name))

This returns

# A tibble: 3 x 6
Name Reference good_run very_great_work bad_run fair_run_decent
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 George Hill 34 21 33 21
2 Frank Stairs 29 30 29 28
3 Bertha Trail 25 21 24 25

Data

df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill", 
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29,
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Reference = structure(list(), class = c("collector_character",
"collector")), Good = structure(list(), class = c("collector_double",
"collector")), Fair = structure(list(), class = c("collector_double",
"collector")), Bad = structure(list(), class = c("collector_double",
"collector")), Great = structure(list(), class = c("collector_double",
"collector")), Poor = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run",
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Adjusted_Name = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

Perform division within single column where classes are identical

Here is an approach that uses data.table::rleid()

library(data.table)

df %>%
mutate(gp:=class %in% c('A','B')) %>%
arrange(class2,class) %>%
group_by(id = rleid(class2,gp)) %>%
mutate(result=value/value[class %in% c('A','C')]) %>%
select(-gp,-id)

A data.table only approach would be:

setDT(df)[,gp:=class %chin% c('A','B')][
order(class2,class),result:=value/value[class %chin% c('A','C')],by=.(rleid(class2,gp))][
,gp:=NULL][]

Output:

      id value class class2 desired.operation result
<int> <dbl> <chr> <chr> <chr> <dbl>
1 1 1 A W 1/1 1
2 1 5 B W 5/1 5
3 2 9 C W 9/9 1
4 2 13 D W 13/9 1.44
5 3 2 A X 2/2 1
6 3 6 B X 6/2 3
7 4 10 C X 10/10 1
8 4 14 D X 14/10 1.4
9 5 3 A Y 3/3 1
10 5 7 B Y 7/3 2.33
11 6 11 C Y 11/11 1
12 6 15 D Y 15/11 1.36
13 7 4 A Z 4/4 1
14 7 8 B Z 8/4 2
15 8 12 C Z 12/12 1
16 8 16 D Z 16/12 1.33


Related Topics



Leave a reply



Submit