How to Merge Two Data.Table by Different Column Names

How to merge two data.table by different column names?

OUTDATED

Use this operation:

X[Y]
#    area   id value price sales
# 1:   US c001   100   500    20
# 2:   UK c002   200   200    30
# 3:   EU c003   300   400    15

or this operation:

Y[X]
#      ID price sales area value
# 1: c001   500    20   US   100
# 2: c002   200    30   UK   200
# 3: c003   400    15   EU   300

Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame but data.table uses merge.data.table. Compare

merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") : 
# A non-empty vector of column names for `by` is required.

with

merge.data.frame(X, Y, by.x = "id", by.y = "ID")
#     id area value price sales
# 1 c001   US   100   500    20
# 2 c002   UK   200   200    30
# 3 c003   EU   300   400    15

Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table team is planning on implementing by.x and by.y into the merge.data.table function, but hasn't done so yet.

R merging tables, with different column names and retaining all columns

Yes, that's possible:

second[first, on=c(i2="index", t2="type"), nomatch=0L, .(i2, t2, index, type, value, i.value)]

   i2 t2 index type value i.value
1:  a  1     a    1     5       3
2:  a  2     a    2     6       4
3:  b  3     b    3     7       5
4:  c  5     c    5     9       7

merging tables with different column names

Using data.table's subset based joins along with the recently implemented on= argument and nomatch=0L, this is simply:

DT2[DT1, on=c(col5="col2", col4="col3"), nomatch=0L]

See the secondary indices vignette for more.

Alternatively if you've the data.tables keyed, then you can skip the on= argument. But the solution above would be idiomatic as it retains the order of original data.tables, and it is clear to tell what columns are being looked up by looking at the code.

setkey(DT1, col2, col3)
setkey(DT2, col5, col4)
DT2[DT1, nomatch=0L]

See history for older versions.

Merging two Datatables with different column names

Probably the problem is caused by the fact that Merge uses the PrimaryKey of the table to find an existing record to update and if it can't find it then add the new record. If this is the case then you should disable the PrimaryKey info retrieved when you have filled the table through the data adapter.

dataTable1.PrimaryKey = Nothing
dataTable2.PrimaryKey = Nothing
dataTable1.Merge(dataTable2, false, MissingSchemaAction.Add)
....

Now Merge cannot find the matches and thus every record in dataTabl2 is added to the dataTable1. However I should warn you to keep an eye on the performances and correctness of other operations on this dataTable1.

Now there is no PrimaryKey set and this could be a source of problems in updating and deleting a row (if you have these operations of course)

Merge two large data.tables based on column name of one table and column value of the other without melting

Using set():

setkey(DT1, "ID")
setkey(DT2, "ID")
for (k in names(DT1)[-1]) {
  rows <- which(DT2[["col"]] == k)
  set(DT2, i = rows, j = "col_value", DT1[DT2[rows], ..k])
}

   ID  col col_value
1:  A col1         1
2:  A col4        13
3:  B col2         6
4:  B col3        10
5:  C col1         3

Note: Setting the key up front speeds up the process but reorders the rows.

Merge two different dataframes on different column names

Well, if you declare column A as index, it works:

Both_DFs = pd.merge(df1.set_index('A', drop=True),df2.set_index('A', drop=True), how='left',left_on=['B'],right_on=['CC'], left_index=True, right_index=True).dropna().reset_index()

This results in:

    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
2  A3  146  K1  B3  345  D1

EDIT

You just needed:

Both_DFs = pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC']).dropna()

Which gives:

    A    B   C  BB   CC  DD
0  A1  121  K0  B0  121  D0

Join data table with different number of rows and column names

(Edited after discussion in the comments)

A dplyr would be something like

library(dplyr)

bind_rows(a, b) %>% 
  mutate(Fz = coalesce(FzR, FzL)) %>% 
  select(Fz, limb, time) %>% 
  group_by(limb) %>% 
  mutate(time = (seq_along(Fz)-1)*0.001)

In this way the newly created variable time will be a sequence of values from 0 to the number of rows for each limb, multiplied by a factor of 0.001 (so they will be milliseconds). For both limbs L and R time will start at 0.

Output

# A tibble: 18 x 3
# Groups:   limb [2]
      Fz limb   time
   <dbl> <chr> <dbl>
 1  131. L     0    
 2  131. L     0.001
 3  131. L     0.002
 4  131. L     0.003
 5  132. L     0.004
 6  132. L     0.005
 7  132. L     0.006
 8  132. L     0.007
 9  133. L     0.008
10  133. L     0.009
11  135. R     0    
12  131. R     0.001
13  134. R     0.002
14  135. R     0.003
15  136. R     0.004
16  136. R     0.005
17  135. R     0.006
18  135. R     0.007

How to Merge Two Data.Table by Different Column Names