Simple Lookup to Insert Values in an R Data Frame

R: Combining dataset and lookup-table to extract value to new colume

We can use a join in data.table

library(data.table)
df1$col3 <- NULL
setDT(df1)[df2, col3 := i.dummy, on = .(col1 = c1, col2 = c2)]
df1
# col1 col2 col3
#1: 25382701 65352617 NA
#2: 22363658 45363783 NA
#3: 20019696 23274747 0

data

df1 <- structure(list(col1 = c(25382701L, 22363658L, 20019696L), col2 = c(65352617L, 
45363783L, 23274747L), col3 = c("-", "-", "-")), class = "data.frame", row.names = c("1",
"2", "3"))

df2 <- structure(list(c1 = c(17472802L, 20383829L, 20019696L, 1382947L,
22123425L), c2 = c(65548585L, 24747473L, 23274747L, 21930283L,
65382920L), dummy = c(1L, 0L, 0L, 1L, 0L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5"))

How to look up values from a table and insert name of the lookup-list?

For the sake of completeness (and performance with large tables, perhaps), here is a data.table approach:

library(data.table)
rbindlist(list(genelist1, genelist2), idcol = "glid")[, -"Gene"][
setDT(df), on = "SYMBOL"][, .(glid = toString(glid)), by = .(Gene, SYMBOL, Values)][]
    Gene SYMBOL Values glid
1: TP53 2 3.55 2
2: XBP1 5 4.06 2
3: TP27 1 2.53 1, 2
4: REDD1 4 3.99 1
5: ERO1L 6 5.02 NA
6: STK11 9 3.64 2
7: HIF2A 8 2.96 NA

rbindlist() creates a data.table from all genelists and adds a column glid to identify the origin of each row. The Gene column is ignored as the subsequent join is only on SYMBOL. Before joining, df is coerced to class data.table using setDT(). The joined result is then aggregated by SYMBOL to exhibit cases where a symbol appears in both genelists which is the case for SYMBOL == 1.



Edit

In case there are many genelists or the full name of the genelist is required instead of just a number, we can try this:

rbindlist(mget(ls(pattern = "^genelist")), idcol = "glid")[, -"Gene"][
setDT(df), on = "SYMBOL"][, .(glid = toString(glid)), by = .(Gene, SYMBOL, Values)][]
    Gene SYMBOL Values                 glid
1: TP53 2 3.55 genelist2
2: XBP1 5 4.06 genelist2
3: TP27 1 2.53 genelist1, genelist2
4: REDD1 4 3.99 NA
5: ERO1L 6 5.02 genelist1
6: STK11 9 3.64 genelist2
7: HIF2A 8 2.96 NA

ls()is looking for objects in the environment the name of which is starting with genelist.... mget() returns a named list of those objects which is passed to rbindlist().

Data

As provided by the OP

df <- structure(list(Gene = c("TP53", "XBP1", "TP27", "REDD1", "ERO1L", 
"STK11", "HIF2A"), SYMBOL = c(2L, 5L, 1L, 4L, 6L, 9L, 8L), Values = c(3.55,
4.06, 2.53, 3.99, 5.02, 3.64, 2.96)), .Names = c("Gene", "SYMBOL",
"Values"), class = "data.frame", row.names = c(NA, -7L))
genelist1 <- structure(list(Gene = c("P4H", "PLK", "TP27", "KTD", "ERO1L"),
SYMBOL = c(10L, 7L, 1L, 11L, 4L)), .Names = c("Gene", "SYMBOL"
), class = "data.frame", row.names = c(NA, -5L))
genelist2 <- structure(list(Gene = c("TP53", "XBP1", "BHLHB", "STK11", "TP27",
"UPK"), SYMBOL = c(2L, 5L, 12L, 9L, 1L, 18L)), .Names = c("Gene",
"SYMBOL"), class = "data.frame", row.names = c(NA, -6L))

Adding values from lookup-table based on condition to data frame in R

You need to make a few manipulations on your data to join them together.

  1. Pivot lookup_data longer with tidyr::pivot_longer() so the gender info is in a column to help merge on.
  2. Use dplyr::rename() to make sure the column names are the same between the two tables.
  3. Transform the gender column so it is just 1 letter to match the other table. Here I use stringr::str_sub(x, 1,1) which just takes the first character of a string.
  4. Then I use left_join() to merge. Because the joining column names are already the same I don't need to specify.
  5. Finally I just reorder and sort the data to match your expected output.
library(tidyverse)

participants_data <- data.frame(
ID = c(1,2,3,4),
gender = c('f','m','d','f'),
image_index = c(19,2,2,19)
)

lookup_data <- data.frame(
index = c(2,19),
male = c(100,110),
female = c(150,125),
diverse = c(130, 90)
)

lookup_data %>%
pivot_longer(-index, names_to = "gender", values_to = "external_value") %>%
rename(image_index = index) %>%
mutate(gender = str_sub(gender, 1, 1)) %>%
left_join(., participants_data) %>%
drop_na(ID) %>%
select(ID, gender, image_index, external_value) %>%
arrange(ID)
#> Joining, by = c("image_index", "gender")
#> # A tibble: 4 x 4
#> ID gender image_index external_value
#> <dbl> <chr> <dbl> <dbl>
#> 1 1 f 19 125
#> 2 2 m 2 100
#> 3 3 d 2 130
#> 4 4 f 19 125

Created on 2022-02-18 by the reprex package (v2.0.1)

How to Use reference table to insert rows into a data frame in R?

One way is to use ifelse :

if data frame = test and activity number column = activitynum,

test$activitylabel <- ifelse(test$activitynum == 1, "walking, ifelse(test$activitynum == 2, "walking_upstairs", ifelse(test$activitynum == 3, "walking_downstairs", ifelse(test$activitynum == 4, "sitting", ifelse(test$activitynum == 5, "standing", ifelse(test$activitynum == 6, "laying", NA))))))

another way is to create a look-up table and then do a merge as suggested by @Jaehyeon:

lookup <- data.frame(activitynum = c(1,2,3,4,5,6), activity = c("walking", "walking_upstairs", "walking_downstairs", "standing", "sitting", "laying"))

survey <- data.frame(id = c(seq(1:10)), activitynum = floor(runif(10, 1, 7)), var1 = runif(10, 1, 100))

merge(survey, lookup, by = "activitynum", all.x = TRUE)

> str(lookup)
'data.frame': 6 obs. of 2 variables:
$ activitynum: num 1 2 3 4 5 6
$ activity : Factor w/ 6 levels "laying","sitting",..: 4 6 5 3 2 1
> str(survey)
'data.frame': 10 obs. of 3 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10
$ activitynum: num 1 2 4 1 4 6 2 4 2 2
$ var1 : num 52.3 60.5 53.3 49.8 73.1 ...

Inserting new values into a data frame using mutate and case_when in dplyr

This is a typical case that rows_* from dplyr can treat:

library(dplyr)

letters_df %>%
rows_patch(new_letters, by = "caps")

caps lows
1 A a
2 B b
3 C c
4 D d
5 E <NA>
6 F f
7 G g
8 H h
9 I <NA>
10 J j

Replace values in a dataframe based on lookup table

You posted an approach in your question which was not bad. Here's a smiliar approach:

new <- df  # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])

An alternative approach which will be faster is:

new <- df
new[] <- look$class[match(unlist(df), look$pet)]

Note that I use empty brackets ([]) in both cases to keep the structure of new as it was (a data.frame).

(I'm using df instead of table and look instead of lookup in my answer)



Related Topics



Leave a reply



Submit