Data.Table Join (Multiple) Selected Columns with New Names

data.table join (multiple) selected columns with new names

Updated answer based on Arun's recommendation:

cols_old <- c('i.a', 'i.b')
DT1[DT2, (cols_new) := mget(cols_old), on = c(id = "id")]

you could also generate the cols_old by doing:

paste0('i.', gsub('_new', '', cols_new, fixed = TRUE))

See history for the old answer.

How to merge two data.table by different column names?

OUTDATED

Use this operation:

X[Y]
#    area   id value price sales
# 1:   US c001   100   500    20
# 2:   UK c002   200   200    30
# 3:   EU c003   300   400    15

or this operation:

Y[X]
#      ID price sales area value
# 1: c001   500    20   US   100
# 2: c002   200    30   UK   200
# 3: c003   400    15   EU   300

Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame but data.table uses merge.data.table. Compare

merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") : 
# A non-empty vector of column names for `by` is required.

with

merge.data.frame(X, Y, by.x = "id", by.y = "ID")
#     id area value price sales
# 1 c001   US   100   500    20
# 2 c002   UK   200   200    30
# 3 c003   EU   300   400    15

Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table team is planning on implementing by.x and by.y into the merge.data.table function, but hasn't done so yet.

R merging tables, with different column names and retaining all columns

Yes, that's possible:

second[first, on=c(i2="index", t2="type"), nomatch=0L, .(i2, t2, index, type, value, i.value)]

   i2 t2 index type value i.value
1:  a  1     a    1     5       3
2:  a  2     a    2     6       4
3:  b  3     b    3     7       5
4:  c  5     c    5     9       7

Data Table R: Merge selected columns from multiple data.table

Just change the by = "ID" to by = c("ID", "FDR", "logFC") and the argument allow.cartesian should be inside the merge

DT.comb <- Reduce(function(...) merge.data.table(...,
   by= c("ID", "FDR", "LogFC"),  all = TRUE,  allow.cartesian=TRUE), dt.list)

Column name labelling in data.table joins

I ended up answering my own question.

data_table_tidy_join <- function(x,y, join_by){

    x <- data.table(x)
    y <- data.table(y)

    # Determine single join names
    single_join_names <- purrr::keep((stringr::str_split(join_by, "==|>=|<=")), ~length(.) == 1) %>% unlist()

    # cols from x that won't require as matching in i
    remove_from_x_names <- c(trimws(na.omit(stringr::str_extract(join_by, ".*(?=[=]{2})"))), single_join_names)

    # names need to keep
    x_names_keep_raw <- names(x)[!names(x) %in% remove_from_x_names]
    y_names_keep_raw <- names(y)

    # cols that exist in both x and y, but not being equi joined on
    cols_rename_index <- x_names_keep_raw[x_names_keep_raw %in% y_names_keep_raw]

    #rename so indexing works
    x_names_keep <- x_names_keep_raw
    y_names_keep <- y_names_keep_raw

    # give prefix to necessary vars
    x_names_keep[x_names_keep %in% cols_rename_index] <- paste("x.",cols_rename_index, sep ="")
    y_names_keep[y_names_keep %in% cols_rename_index] <- paste("i.",cols_rename_index, sep ="")

    # implement data.table call, keeping required cols
    joined_data <-
        x[y, on = join_by,
          mget(c(paste0("i.", y_names_keep_raw),paste0("x.", x_names_keep_raw))) %>% set_names(c(y_names_keep,x_names_keep)),
          mult = "all", allow.cartesian = TRUE, nomatch = NA] %>%
        as_tibble()

    return(joined_data)

}

> x <- data.table(Id  = c("A", "B", "C", "C"),
+                  X1  = c(1L, 3L, 5L, 7L),
+                  X2 = c(8L,12L,9L,18L),
+                  XY  = c("x2", "x4", "x6", "x8"))
>  
> z <- data.table(ID = "C", Z1 = 5:9, Z2 = paste0("z", 5:9))
>   
> data_table_tidy_join(x, z, join_by = c("Id == ID","X1 <= Z1", "X2 >= Z1"))
# A tibble: 8 x 6
  ID       Z1 Z2       X1    X2 XY   
  <chr> <int> <chr> <int> <int> <chr>
1 C         5 z5        5     9 x6   
2 C         6 z6        5     9 x6   
3 C         7 z7        5     9 x6   
4 C         7 z7        7    18 x8   
5 C         8 z8        5     9 x6   
6 C         8 z8        7    18 x8   
7 C         9 z9        5     9 x6   
8 C         9 z9        7    18 x8

Select/Join multiple fields from different tables with same column name

After joining, you can use COALESCE to get the non-null value from the table with a matching row.

$sql = "SELECT k.uaID, k.userID, k.key, k.appName, 
                COALESCE(d1.sector, d3.sector, '') AS sector, 
                COALESCE(d1.subSector, d3.subSector, '') AS subSector, 
                COALESCE(d1.topic, d2.topic, d3.topic, '') AS topic,
                users.title, users.firstName, users.lastName, users.email 
        FROM keyTable AS k
        LEFT OUTER JOIN dataTable01 AS d1 ON k.uaID = d1.uaID 
        LEFT OUTER JOIN dataTable02 AS d2 ON k.uaID = d2.uaID 
        LEFT OUTER JOIN dataTable03 AS d3 ON k.uaID = d3.uaID 
        LEFT OUTER JOIN users ON k.userID = users.userID
        ORDER BY k.uaID";

Another way to merge the data from the datatablesNN tables into the same column os tp use UNION.

SELECT k.uaID, k.userID, k.key, k.appName, IFNULL(d.sector, '') AS sector, IFNULL(d.subSector, '') AS subSector, IFNULL(d.topic, '') AS topic,
        u.title, u.firstName, u.lastName, u.email
FROM keyTable AS k
LEFT OUTER JOIN (
    SELECT uaID, sector, subSector, topic
    FROM dataTable01
    UNION
    SELECT uaID, NULL, NULL, topic
    FROM datatable02
    UNION
    SELECT uaID, sector, subSector, topic
    FROM datatable03) AS d
ON k.uaID = d.uaID
LEFT JOIN users AS u ON u.userID = k.userID
ORDER BY k.uaID

DEMO

Joining and selecting multiple tables and creating new column names

You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.

SELECT 
    `extensionID` AS `Extension`,
    SUM(`Duration`) AS `Total Talk Time`,
    COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
    SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
    ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`

Join in place multiple columns data.table

A possible solution with mget :

cols <- colnames(b)[-1]
data[b,(cols) := mget(cols), on = "ref"][]

     ref  names    radius     height      weight     volume      money
   <int> <char>     <num>      <num>       <num>      <num>      <num>
1:     1      a 0.9757437 -1.2441942 -0.19825867  0.7934811  0.6642228
2:     2      b 0.9752424  1.0112435 -1.27333411  0.9296093 -1.1942749
3:     3      c 0.6989610  0.8868158 -1.54361300 -0.4824725 -0.8368856
4:     4      d 0.4336850  0.8949459  1.45199656 -0.3262636 -0.5158351
5:     5      e 0.9989011  0.4237771  0.04365383 -1.4086310  0.7379102
6:     6      f 0.2640407 -2.0085267 -0.15621784  1.3278945  0.2616385

data.table joins - Select all columns in the i argument

How about constructing the j-expression and just eval'ing it?

nc = names(current)[-1L]
nn = paste0("i.", nc)
expr = lapply(nn, as.name)
setattr(expr, 'names', nc)
expr = as.call(c(quote(`:=`), expr))

> current[new[c(1,3)], eval(expr)]
> current
##    id var var2
## 1:  1  11   11
## 2:  2   2    2
## 3:  3  13   13
## 4:  4   4    4

Joining tables based on different column names

Update: All the features listed below are implemented and is available in the current stable version of data.table v1.9.6 on CRAN.

There are at least these improvements possible for joins in data.tables.

merge.data.table gaining by.x and by.y arguments
Using secondary keys to join using both forms discussed above without need to set keys, but rather by specifying columns on x and i.

The simplest reason is that we've not managed to get to it yet.

Data.Table Join (Multiple) Selected Columns with New Names