Data.Table Joins - Select All Columns in the I Argument

data.table joins - Select all columns in the i argument

How about constructing the j-expression and just eval'ing it?

nc = names(current)[-1L]
nn = paste0("i.", nc)
expr = lapply(nn, as.name)
setattr(expr, 'names', nc)
expr = as.call(c(quote(`:=`), expr))

> current[new[c(1,3)], eval(expr)]
> current
## id var var2
## 1: 1 11 11
## 2: 2 2 2
## 3: 3 13 13
## 4: 4 4 4

MySQL Select all columns from one table and some from another table

Just use the table name:

SELECT myTable.*, otherTable.foo, otherTable.bar...

That would select all columns from myTable and columns foo and bar from otherTable.

r - data.table join and then add all columns from one table to another

Just create a function that takes names as arguments and constructs the expression for you. And then eval it each time by passing the names of each data.table you require. Here's an illustration:

get_expr <- function(x) {
# 'x' is the names vector
expr = paste0("i.", x)
expr = lapply(expr, as.name)
setattr(expr, 'names', x)
as.call(c(quote(`:=`), expr))
}

> get_expr('value') ## generates the required expression
# `:=`(value = i.value)

template[x, eval(get_expr("value"))]
template[y, eval(get_expr("value"))]

# id1 id2 value
# 1: a 1 NA
# 2: a 2 0.01649728
# 3: a 3 -0.27918482
# 4: a 4 -1.16343900
# 5: a 5 NA
# 6: b 1 NA
# 7: b 2 NA
# 8: b 3 0.86933718
# 9: b 4 2.26787200
# 10: b 5 1.08325800

R data.table - simple way to join on all columns without specifying column names

I think that the way to do this in the data table is the following:

require(data.table)
dt1 <- data.table(A1 = c(1,2,3), A2 = c("A", "B", "D"))
dt2 <- data.table(A1 = c(3,2,3), A2 = c("A", "B", "C"))

setkeyv(dt1, names(dt1))
setkeyv(dt2, names(dt2))

and the inner join on all common columns is:

dt1[dt2, nomatch = 0]

Other options include the following (credits to Frank in the comments):

dt1[dt2, on=names(dt2), nomatch = 0]

This has the benefit of not requiring to key the data table. (More info can be found here: What is the purpose of setting a key in data.table? )


Another option using the data sets operations (available in version 1.9.7 or later):

fintersect(dt1, dt2)

Join two data frames, select all columns from one and some columns from the other

Not sure if the most efficient way, but this worked for me:

from pyspark.sql.functions import col

df1.alias('a').join(df2.alias('b'),col('b.id') == col('a.id')).select([col('a.'+xx) for xx in a.columns] + [col('b.other1'),col('b.other2')])

The trick is in:

[col('a.'+xx) for xx in a.columns] : all columns in a

[col('b.other1'),col('b.other2')] : some columns of b

Select all columns from table which is a result of joining two tables

Based on your comment, you could create TABLE_AB like this:

CREATE TABLE TABLE_AB 
AS (SELECT TABLE_A.* FROM TABLE_A NATURAL JOIN TABLE_B);

Now it is a copy of TABLE_A but containing only the rows you want to delete. You can reinstate those rows later using:

insert into table_a select * from table_ab;

What does stand for in data.table joins with on=

When doing a non-equi join like X[Y, on = .(A < A)] data.table returns the A-column from Y (the i-data.table).

To get the desired result, you could do:

X[Y, on = .(A < A), .(A = x.A, B)]

which gives:

   A B
1: 1 1
2: 2 1
3: 3 1

In the next release, data.table will return both A columns. See here for the discussion.

non-equi joins adding all columns of range table in data.table in one step

Since you want results for every row of a, you should do a join like b[a, ...]:

b[a, on=.(LB <= salary, UB > salary), nomatch=0, 
.(Company_ID, salary, cat, LB = x.LB, UB = x.UB, rep)]

Company_ID salary cat LB UB rep
1: 1 2000 1 0 3000 Bob
2: 1 3000 2 3000 5000 Alice
3: 1 4000 2 3000 5000 Alice
  • nomatch=0 means we'll drop rows of a that are unmatched in b.
  • We need to explicitly refer to the UB and LB columns from b using the x.* prefix (coming from the ?data.table docs, where the arguments are named like x[i]).

Regarding the strange default cols, there is an open issue to change that behavior: #1615.


(Issue #1989, referenced below, is fixed now -- See Uwe's answer.)

Alternately... One way that should work and avoids explicitly listing all columns: add a's columns to b, then subset b:

b[a, on=.(LB <= salary, UB > salary), names(a) := mget(paste0("i.", names(a)))] 
b[b[a, on=.(LB <= salary, UB > salary), which=TRUE, nomatch=0]]

There are two problems with this. First, there's a bug causing non-equi join to break when confronted with mget (#1989). The temporary workaround is to enumerate a's columns:

b[a, on=.(LB <= salary, UB > salary), `:=`(Company_ID = i.Company_ID, salary = i.salary)] 
b[b[a, on=.(LB <= salary, UB > salary), which=TRUE, nomatch=0]]

Second, it's inefficient to do this join twice (once for := and a second time for which), but I can't see any way around that... maybe justifying a feature request to allow both j and which?

select all columns from both tables postgresql function

The one problem is that neither function nor views can return the columns with same names (in your example columns CustomerID presented in both tables). And the another one - syntax:

RETURNS TABLE ( column_name column_type [, ...] )

from the official doc, nothing about table_name.*.

Aside of the obvious solution where you specifying the complete list of columns, there is one trick with composite (row, record) types:

CREATE  FUNCTION get_data (p_pattern VARCHAR,p_year INT) 
RETURNS TABLE (order orders, customer customers)
AS $$

Note that you can use table/view names as types in declarations.

And in that case your query could looks like

SELECT a, b
FROM Orders a
JOIN Customers b ON a.CustomerID=b.CustomerID

After that the usage of the function would be:

select
*, -- two composite columns
(order).*, -- all columns from table orders
(customer).*, -- all columns from table customers
(order).CustomerID -- specific column from specific table
from
get_data(<parameters here>);

dbfiddle



Related Topics



Leave a reply



Submit