data.table joins - Select all columns in the i argument
How about constructing the j-expression
and just eval
'ing it?
nc = names(current)[-1L]
nn = paste0("i.", nc)
expr = lapply(nn, as.name)
setattr(expr, 'names', nc)
expr = as.call(c(quote(`:=`), expr))
> current[new[c(1,3)], eval(expr)]
> current
## id var var2
## 1: 1 11 11
## 2: 2 2 2
## 3: 3 13 13
## 4: 4 4 4
MySQL Select all columns from one table and some from another table
Just use the table name:
SELECT myTable.*, otherTable.foo, otherTable.bar...
That would select all columns from myTable
and columns foo
and bar
from otherTable
.
r - data.table join and then add all columns from one table to another
Just create a function that takes names as arguments and constructs the expression for you. And then eval
it each time by passing the names of each data.table
you require. Here's an illustration:
get_expr <- function(x) {
# 'x' is the names vector
expr = paste0("i.", x)
expr = lapply(expr, as.name)
setattr(expr, 'names', x)
as.call(c(quote(`:=`), expr))
}
> get_expr('value') ## generates the required expression
# `:=`(value = i.value)
template[x, eval(get_expr("value"))]
template[y, eval(get_expr("value"))]
# id1 id2 value
# 1: a 1 NA
# 2: a 2 0.01649728
# 3: a 3 -0.27918482
# 4: a 4 -1.16343900
# 5: a 5 NA
# 6: b 1 NA
# 7: b 2 NA
# 8: b 3 0.86933718
# 9: b 4 2.26787200
# 10: b 5 1.08325800
R data.table - simple way to join on all columns without specifying column names
I think that the way to do this in the data table is the following:
require(data.table)
dt1 <- data.table(A1 = c(1,2,3), A2 = c("A", "B", "D"))
dt2 <- data.table(A1 = c(3,2,3), A2 = c("A", "B", "C"))
setkeyv(dt1, names(dt1))
setkeyv(dt2, names(dt2))
and the inner join on all common columns is:
dt1[dt2, nomatch = 0]
Other options include the following (credits to Frank in the comments):
dt1[dt2, on=names(dt2), nomatch = 0
]
This has the benefit of not requiring to key the data table. (More info can be found here: What is the purpose of setting a key in data.table? )
Another option using the data sets operations (available in version 1.9.7 or later):
fintersect(dt1, dt2)
Join two data frames, select all columns from one and some columns from the other
Not sure if the most efficient way, but this worked for me:
from pyspark.sql.functions import col
df1.alias('a').join(df2.alias('b'),col('b.id') == col('a.id')).select([col('a.'+xx) for xx in a.columns] + [col('b.other1'),col('b.other2')])
The trick is in:
[col('a.'+xx) for xx in a.columns] : all columns in a
[col('b.other1'),col('b.other2')] : some columns of b
Select all columns from table which is a result of joining two tables
Based on your comment, you could create TABLE_AB like this:
CREATE TABLE TABLE_AB
AS (SELECT TABLE_A.* FROM TABLE_A NATURAL JOIN TABLE_B);
Now it is a copy of TABLE_A but containing only the rows you want to delete. You can reinstate those rows later using:
insert into table_a select * from table_ab;
What does stand for in data.table joins with on=
When doing a non-equi join like X[Y, on = .(A < A)]
data.table returns the A
-column from Y
(the i
-data.table).
To get the desired result, you could do:
X[Y, on = .(A < A), .(A = x.A, B)]
which gives:
A B
1: 1 1
2: 2 1
3: 3 1
In the next release, data.table will return both A
columns. See here for the discussion.
non-equi joins adding all columns of range table in data.table in one step
Since you want results for every row of a
, you should do a join like b[a, ...]
:
b[a, on=.(LB <= salary, UB > salary), nomatch=0,
.(Company_ID, salary, cat, LB = x.LB, UB = x.UB, rep)]
Company_ID salary cat LB UB rep
1: 1 2000 1 0 3000 Bob
2: 1 3000 2 3000 5000 Alice
3: 1 4000 2 3000 5000 Alice
nomatch=0
means we'll drop rows ofa
that are unmatched inb
.- We need to explicitly refer to the
UB
andLB
columns fromb
using thex.*
prefix (coming from the?data.table
docs, where the arguments are named likex[i]
).
Regarding the strange default cols, there is an open issue to change that behavior: #1615.
(Issue #1989, referenced below, is fixed now -- See Uwe's answer.)
Alternately... One way that should work and avoids explicitly listing all columns: add a
's columns to b
, then subset b
:
b[a, on=.(LB <= salary, UB > salary), names(a) := mget(paste0("i.", names(a)))]
b[b[a, on=.(LB <= salary, UB > salary), which=TRUE, nomatch=0]]
There are two problems with this. First, there's a bug causing non-equi join to break when confronted with mget
(#1989). The temporary workaround is to enumerate a
's columns:
b[a, on=.(LB <= salary, UB > salary), `:=`(Company_ID = i.Company_ID, salary = i.salary)]
b[b[a, on=.(LB <= salary, UB > salary), which=TRUE, nomatch=0]]
Second, it's inefficient to do this join twice (once for :=
and a second time for which
), but I can't see any way around that... maybe justifying a feature request to allow both j
and which
?
select all columns from both tables postgresql function
The one problem is that neither function nor views can return the columns with same names (in your example columns CustomerID
presented in both tables). And the another one - syntax:
RETURNS TABLE ( column_name column_type [, ...] )
from the official doc, nothing about table_name.*
.
Aside of the obvious solution where you specifying the complete list of columns, there is one trick with composite (row, record) types:
CREATE FUNCTION get_data (p_pattern VARCHAR,p_year INT)
RETURNS TABLE (order orders, customer customers)
AS $$
Note that you can use table/view names as types in declarations.
And in that case your query could looks like
SELECT a, b
FROM Orders a
JOIN Customers b ON a.CustomerID=b.CustomerID
After that the usage of the function would be:
select
*, -- two composite columns
(order).*, -- all columns from table orders
(customer).*, -- all columns from table customers
(order).CustomerID -- specific column from specific table
from
get_data(<parameters here>);
dbfiddle
Related Topics
Using Italic() with a Variable in Ggplot2 Title Expression
Assign Color to 2 Different Geoms and Get 2 Different Legends
Obtaining Twitter Screen Names from a Twitter List
Sum Multiple Variables by Group
Separate a Column of a Dataframe in Undefined Number of Columns with R/Tidyverse
Terms of a Sum in a R Expression
How to Convert All Column Data Type to Numeric and Character Dynamically
R Convert String Date (E.G. "October 1, 2014") to Date Format
Add Multiple Curves/Functions to One Ggplot Through Looping
Removing Unicode Symbols from Column Names
Group Values by Unique Elements
How to Decode Postgresql Bytea Column Hex to Int16/Uint16 in R
How to Edit Column Names in Datatable Function When Running R Shiny App
Reshape Data from Wide to Long
How to Order a Nominale Variable. E.G Month in R
Visual Bug When Changing Robinson Projection's Central Meridian with Ggplot2