Adding values in two data.tables
I prefer Richard's way, but here's an alternative that looks more like the OP's initial idea:
vs = setdiff(names(DT1),"rn")
DT2[DT1, (vs) := {
x.SD = mget(vs)
i.SD = mget(paste0("i.",vs))
Map("+", x.SD, i.SD)
}, on="rn", by=.EACHI]
# rn a b c
# 1: a 0 4 1
# 2: b 1 5 0
# 3: c 1 1 3
Sum values from two tables with a condition in R
With base R
, you can try:
data <- rbind(a[a$Task %in% b$Task, ], b)
aggregate(. ~ Task, sum, na.action = "na.pass", data = data)
Task FC FH
1 A 110 NA
2 B NA 110
3 C 330 230
Or the same with dplyr
:
bind_rows(a[a$Task %in% b$Task, ], b) %>%
group_by(Task) %>%
summarise_all(sum)
Task FC FH
<chr> <dbl> <dbl>
1 A 110 NA
2 B NA 110
3 C 330 230
Or to have it even more dplyr
-like:
bind_rows(a, b, .id = "ID") %>%
group_by(Task) %>%
filter(n_distinct(ID) != 1) %>%
select(-ID) %>%
summarise_all(sum)
combine data.tables and sum the shared column
merge
is likely to not be very efficient for the end result you are after. Since both of your data.table
s have the same structure, I would suggest rbind
ing them together and taking the sum by their key. In other words:
rbindlist(list(a, a2))[, sum(c), b]
I've used rbindlist
because it is generally more efficient at rbind
ing data.table
s (even though you have to first put your data.table
s in a list
).
Compare some timings on larger datasets:
library(data.table)
library(stringi)
set.seed(1)
n <- 1e7; n2 <- 1e6
x <- stri_rand_strings(n, 4)
a2 <- data.table(b = sample(x, n2), c = sample(100, n2, TRUE))
a <- data.table(b = sample(x, n2), c = sample(10, n2, TRUE))
system.time(rbindlist(list(a, a2))[, sum(c), b])
# user system elapsed
# 0.83 0.05 0.87
system.time(merge(a2, a, by = "b", all = TRUE)[, rowSums(.SD, na.rm = TRUE), b]) # Get some coffee
# user system elapsed
# 159.58 0.48 162.95
## Do we have all the rows we expect to have?
length(unique(c(a$b, a2$b)))
# [1] 1782166
nrow(rbindlist(list(a, a2))[, sum(c), b])
# [1] 1782166
Merge two large data.tables based on column name of one table and column value of the other without melting
Using set()
:
setkey(DT1, "ID")
setkey(DT2, "ID")
for (k in names(DT1)[-1]) {
rows <- which(DT2[["col"]] == k)
set(DT2, i = rows, j = "col_value", DT1[DT2[rows], ..k])
}
ID col col_value
1: A col1 1
2: A col4 13
3: B col2 6
4: B col3 10
5: C col1 3
Note: Setting the key up front speeds up the process but reorders the rows.
compare two data.tables by row and add new column
A data.table option:
DT1[DT2, on=.(num > numA, num < numB, value > valueA, value < valueB), Mark := i.Mark]
DT1
ID num value Mark
1: F 59 90 Abner
2: A 3 47 <NA>
3: E 108 189 Abner
4: B 11 72 Norman
5: C 22 42 Abner
6: D 54 86 Abner
7: C 241 280 Trista
Combine two data tables with null values on both tables - C#
You can use Two Data tables to combine into one Data table via Coding and Remove the Extra Column later the For loop ends,Check my code it will work.
string Qry = "select tab1.table_id,'' as DriverID,vehicle,tab1.driver_id Tab1DrvrID,exit_time from tab1 " +
"full join tab2 on tab2.driver_id=tab1.driver_id";
cmd = new SqlCommand(Qry, con);
da = new SqlDataAdapter(cmd);
dt = new DataTable();
da.Fill(dt);
//string DrvrID;
for (int i = 0; i < dt.Rows.Count; i++)
{
string Qry2 = "select tab1.table_id,'' as DriverID,vehicle,tab1.driver_id Tab1DrvrID,tab2.driver_id Tab2DrvrID,exit_time from tab1 " +
"full join tab2 on tab2.driver_id=tab1.driver_id ";
cmd = new SqlCommand(Qry2, con);
SqlDataAdapter daa = new SqlDataAdapter();
DataTable dtt = new DataTable();
daa = new SqlDataAdapter(cmd);
daa.Fill(dtt);
if (dtt.Rows.Count > 0)//
{
string s=dtt.Rows[i]["Tab1DrvrID"].ToString();
if (s=="")
{
dt.Rows[i]["DriverID"] = dtt.Rows[i]["Tab2DrvrID"].ToString();
}
else
{
dt.Rows[i]["DriverID"] = dtt.Rows[i]["Tab1DrvrID"].ToString();
}
}
else
{
}
dt.AcceptChanges();
}
dt.Columns.Remove("Tab1DrvrID");
Related Topics
Plot Weighted Frequency Matrix
Use Different Font Sizes for Different Portions of Text in Ggplot2 Title
How to Set R to Default Options
Standard Deviation on Dataframe Does Not Work
Linear Regression with Constraints on The Coefficients
Using Leaflet-Side-By-Side Plugin in R
Aggregating Rows for Multiple Columns in R
How to Define "Hidden Global Variables" Inside R Packages
Summing Multiple Columns in an R Data-Frame Quickly
Merge Data Based on Nearest Date R
R: As.Posixct Timezone and Scale_X_Datetime Issues in My Dataset
R Mlogit Model, Computationally Singular
Combination of Expand.Grid and Mapply
Encrypt Password in R - to Connect to an Oracle Db Using Rodbc
Line Segments or Rectangles with Hover Information in R Plotly Figure
Using Inst/Extdata with Vignette During Package Checking R 2.14.0