R: Combining dataset and lookup-table to extract value to new colume
We can use a join in data.table
library(data.table)
df1$col3 <- NULL
setDT(df1)[df2, col3 := i.dummy, on = .(col1 = c1, col2 = c2)]
df1
# col1 col2 col3
#1: 25382701 65352617 NA
#2: 22363658 45363783 NA
#3: 20019696 23274747 0
data
df1 <- structure(list(col1 = c(25382701L, 22363658L, 20019696L), col2 = c(65352617L,
45363783L, 23274747L), col3 = c("-", "-", "-")), class = "data.frame", row.names = c("1",
"2", "3"))
df2 <- structure(list(c1 = c(17472802L, 20383829L, 20019696L, 1382947L,
22123425L), c2 = c(65548585L, 24747473L, 23274747L, 21930283L,
65382920L), dummy = c(1L, 0L, 0L, 1L, 0L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5"))
How to look up values from a table and insert name of the lookup-list?
For the sake of completeness (and performance with large tables, perhaps), here is a data.table
approach:
library(data.table)
rbindlist(list(genelist1, genelist2), idcol = "glid")[, -"Gene"][
setDT(df), on = "SYMBOL"][, .(glid = toString(glid)), by = .(Gene, SYMBOL, Values)][]
Gene SYMBOL Values glid
1: TP53 2 3.55 2
2: XBP1 5 4.06 2
3: TP27 1 2.53 1, 2
4: REDD1 4 3.99 1
5: ERO1L 6 5.02 NA
6: STK11 9 3.64 2
7: HIF2A 8 2.96 NA
rbindlist()
creates a data.table
from all genelists and adds a column glid
to identify the origin of each row. The Gene
column is ignored as the subsequent join is only on SYMBOL
. Before joining, df
is coerced to class data.table
using setDT()
. The joined result is then aggregated by SYMBOL
to exhibit cases where a symbol appears in both genelists which is the case for SYMBOL == 1
.
Edit
In case there are many genelists or the full name of the genelist is required instead of just a number, we can try this:
rbindlist(mget(ls(pattern = "^genelist")), idcol = "glid")[, -"Gene"][
setDT(df), on = "SYMBOL"][, .(glid = toString(glid)), by = .(Gene, SYMBOL, Values)][]
Gene SYMBOL Values glid
1: TP53 2 3.55 genelist2
2: XBP1 5 4.06 genelist2
3: TP27 1 2.53 genelist1, genelist2
4: REDD1 4 3.99 NA
5: ERO1L 6 5.02 genelist1
6: STK11 9 3.64 genelist2
7: HIF2A 8 2.96 NA
ls()
is looking for objects in the environment the name of which is starting with genelist...
. mget()
returns a named list of those objects which is passed to rbindlist()
.
Data
As provided by the OP
df <- structure(list(Gene = c("TP53", "XBP1", "TP27", "REDD1", "ERO1L",
"STK11", "HIF2A"), SYMBOL = c(2L, 5L, 1L, 4L, 6L, 9L, 8L), Values = c(3.55,
4.06, 2.53, 3.99, 5.02, 3.64, 2.96)), .Names = c("Gene", "SYMBOL",
"Values"), class = "data.frame", row.names = c(NA, -7L))
genelist1 <- structure(list(Gene = c("P4H", "PLK", "TP27", "KTD", "ERO1L"),
SYMBOL = c(10L, 7L, 1L, 11L, 4L)), .Names = c("Gene", "SYMBOL"
), class = "data.frame", row.names = c(NA, -5L))
genelist2 <- structure(list(Gene = c("TP53", "XBP1", "BHLHB", "STK11", "TP27",
"UPK"), SYMBOL = c(2L, 5L, 12L, 9L, 1L, 18L)), .Names = c("Gene",
"SYMBOL"), class = "data.frame", row.names = c(NA, -6L))
Adding values from lookup-table based on condition to data frame in R
You need to make a few manipulations on your data to join
them together.
- Pivot
lookup_data
longer withtidyr::pivot_longer()
so thegender
info is in a column to help merge on. - Use
dplyr::rename()
to make sure the column names are the same between the two tables. - Transform the
gender
column so it is just1
letter to match the other table. Here I usestringr::str_sub(x, 1,1)
which just takes the first character of a string. - Then I use
left_join()
to merge. Because the joining column names are already the same I don't need to specify. - Finally I just reorder and sort the data to match your expected output.
library(tidyverse)
participants_data <- data.frame(
ID = c(1,2,3,4),
gender = c('f','m','d','f'),
image_index = c(19,2,2,19)
)
lookup_data <- data.frame(
index = c(2,19),
male = c(100,110),
female = c(150,125),
diverse = c(130, 90)
)
lookup_data %>%
pivot_longer(-index, names_to = "gender", values_to = "external_value") %>%
rename(image_index = index) %>%
mutate(gender = str_sub(gender, 1, 1)) %>%
left_join(., participants_data) %>%
drop_na(ID) %>%
select(ID, gender, image_index, external_value) %>%
arrange(ID)
#> Joining, by = c("image_index", "gender")
#> # A tibble: 4 x 4
#> ID gender image_index external_value
#> <dbl> <chr> <dbl> <dbl>
#> 1 1 f 19 125
#> 2 2 m 2 100
#> 3 3 d 2 130
#> 4 4 f 19 125
Created on 2022-02-18 by the reprex package (v2.0.1)
How to Use reference table to insert rows into a data frame in R?
One way is to use ifelse
:
if data frame = test and activity number column = activitynum,
test$activitylabel <- ifelse(test$activitynum == 1, "walking, ifelse(test$activitynum == 2, "walking_upstairs", ifelse(test$activitynum == 3, "walking_downstairs", ifelse(test$activitynum == 4, "sitting", ifelse(test$activitynum == 5, "standing", ifelse(test$activitynum == 6, "laying", NA))))))
another way is to create a look-up table and then do a merge
as suggested by @Jaehyeon:
lookup <- data.frame(activitynum = c(1,2,3,4,5,6), activity = c("walking", "walking_upstairs", "walking_downstairs", "standing", "sitting", "laying"))
survey <- data.frame(id = c(seq(1:10)), activitynum = floor(runif(10, 1, 7)), var1 = runif(10, 1, 100))
merge(survey, lookup, by = "activitynum", all.x = TRUE)
> str(lookup)
'data.frame': 6 obs. of 2 variables:
$ activitynum: num 1 2 3 4 5 6
$ activity : Factor w/ 6 levels "laying","sitting",..: 4 6 5 3 2 1
> str(survey)
'data.frame': 10 obs. of 3 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10
$ activitynum: num 1 2 4 1 4 6 2 4 2 2
$ var1 : num 52.3 60.5 53.3 49.8 73.1 ...
Inserting new values into a data frame using mutate and case_when in dplyr
This is a typical case that rows_*
from dplyr
can treat:
library(dplyr)
letters_df %>%
rows_patch(new_letters, by = "caps")
caps lows
1 A a
2 B b
3 C c
4 D d
5 E <NA>
6 F f
7 G g
8 H h
9 I <NA>
10 J j
Replace values in a dataframe based on lookup table
You posted an approach in your question which was not bad. Here's a smiliar approach:
new <- df # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])
An alternative approach which will be faster is:
new <- df
new[] <- look$class[match(unlist(df), look$pet)]
Note that I use empty brackets ([]
) in both cases to keep the structure of new
as it was (a data.frame).
(I'm using df
instead of table
and look
instead of lookup
in my answer)
Related Topics
Shading Confidence Intervals Manually with Ggplot2
Want Only the Time Portion of a Date-Time Object in R
How to Ddply() Without Sorting
R How to Extract First Row of Each Matrix Within a List
Make R Studio Plots Only Show Up in New Window
Ggplot2 Error "No Layers in Plot"
Reshaping a Data Frame with More Than One Measure Variable
Shade (Fill or Color) Area Under Density Curve by Quantile
What Are the Caveats of Using Source Versus Parse & Eval
R: Data.Table Count !Na Per Row
Making a Zip Code Choropleth in R Using Ggplot2 and Ggmap
Saving a Data Frame as a Binary File
Generating a Very Large Matrix of String Combinations Using Combn() and Bigmemory Package
How to Jitter Two Ggplot Geoms in the Same Way
Import Multiple Text Files in R and Assign Them Names from a Predetermined List