Removing One Table from Another in R

Removing one table from another in R

We can use anti_join

library(dplyr)
anti_join(A, B, by = c('Col1', 'Col2'))

How to delete rows in one table, based on the values of another table

A possible approach with base R:

tab1[tab1$UserID %in% tab2$UserID[!tab2$Admin],]

which gives:

  UserID AssigID Score           TimeStamp TimeOnTask
2 12254 23956 22 2017-11-18 13:16:00 256
3 12644 23956 74 2012-12-17 13:18:00 365
4 11257 23957 45 2012-10-10 13:29:00 102
5 12667 23958 25 2012-11-10 13:40:00 109

What this does:

  • tab2$UserID[!tab2$Admin] gives a vector of user ID's that are not an Admin. The !tab2$Admin part makes sure only the ID's that are not an Admin are selected.
  • with tab1$UserID %in% ... you select only the user ID's from tab1 that are in the vector from the first step. This returns a logical vector with which you subsequently subset tab1

Used data:

tab1 <- structure(list(UserID = c(14532L, 12254L, 12644L, 11257L, 12667L),
AssigID = c(23956L, 23956L, 23956L, 23957L, 23958L),
Score = c(52L, 22L, 74L, 45L, 25L),
TimeStamp = structure(c(1510402260, 1511007360, 1355746680, 1349868540, 1352551200), class = c("POSIXct", "POSIXt"), tzone = ""),
TimeOnTask = c(401L, 256L, 365L, 102L, 109L)),
.Names = c("UserID", "AssigID", "Score", "TimeStamp", "TimeOnTask"), row.names = c(NA, -5L), class = "data.frame")
tab2 <- structure(list(UserID = c(14532L, 12254L, 12644L, 11257L, 12667L),
Admin = c(TRUE, FALSE, FALSE, FALSE, FALSE)),
.Names = c("UserID", "Admin"), class = "data.frame", row.names = c(NA, -5L))

R: How to remove values from a table that appear in another table?

There's a bunch of ways to do this.

Base R subset solution (as noted by Balter above):

M4M3.new <- M4M3[!(M4M3$gene_id %in% M4F4$gene_id),]

Base R set union solution:

M4M3.new <- setdiff(M4M3, M4F4)

Dplyr solution

M4M3.new <- dplyr::anti_join(M4M3, 
M4F4,
by = c("gene_id" = "gene_id"))

Edit: All appeared to work tested on the following dataset:

tst1 <- data.frame(gene_id = seq(1:10), 
sample_1 = rep("M4", 10),
sample_2 = c(rep("M3", 6), rep("F4", 4)),
other_values = sample(1:10, 10, replace = T),
other_values2 = rep("OK", 10))

M4M3 <- tst1[tst1$sample_1 == "M4" & tst1$sample_2 == "M3",]
M4F4 <- tst1[tst1$sample_1 == "M4" & tst1$sample_2 == "F4",]

Remove rows in data.table according to another data.table

Use an anti-join:

dtA[!dtB, on=.(date, company, value)]

This matches all records in dtA that are not found in dtB using the columns in on.

Delete rows that exist in another data frame?

You need the %in% operator. So,

df1[!(df1$name %in% df2$name),]

should give you what you want.

  • df1$name %in% df2$name tests whether the values in df1$name are in df2$name
  • The ! operator reverses the result.

Removing specific groups in R in data.table

Just use your group_vector with %in% operator.

data[group %in% group_vector]

group values
1: 1001 10
2: 2800 23
3: 3230 32
4: 4600 34

Removing data from one dataframe that exists in another dataframe R

Base R Solution

list_one[!list_one$letters %in% list_two$letters2,]

gives you:

  letters numbers
2 b 2
5 e 5

Explanation:

> list_one$letters %in% list_two$letters2
[1] TRUE FALSE TRUE TRUE FALSE

This gives you a vector of LENGTH == length(list_one$letters) with TRUE/FALSE Values. ! negates this vector. So you end up with FALSE/TRUE values if the value is present in list_two$letters2.

If you have questions about how to select rows from a data.frame enter

?`[.data.frame`

to the console and read it.

Remove rows in one dataframe if they are present in another dataframe

In Base R

df[-match(df2$ASV, df$ASV),]

or even

 dplyr::anti_join(df, df2)


Related Topics



Leave a reply



Submit