Removing duplicate combinations (irrespective of order)
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5
Removing duplicate all-way-combinations while retaining all columns
Here's a base solution, using the complete.cases
function, and also creating a sorted feedID
column:
# remove any rows with NA values
test <- test[complete.cases(test[,c('ID', 'feedID','feedID2')]),]
#remove any rows with feedID == feedID2
test <- test[!(test$feedID == test$feedID2),]
# add new feedID3 column
test$feedID3 <- apply(test, 1, function(x) paste(sort(c(x[2], x[3])), collapse = '-'))
# remove any duplicates, and remove last column
test[!duplicated(test[,c('feedID3', 'ID')]), -4]
ID feedID feedID2
2 49V A1 G2
6 52V B1 D1
7 52V D1 D2
data
Note that we have converted "NA"
to NA
, and we have also set stringsAsFactors = TRUE
test <- data.frame(ID= c("49V", "49V","49V", "49V", "49V", "52V", "52V", "52V"),
feedID = c("A1", "A1", "G2", "A1", "G2", "B1", "D1", "D2" ),
feedID2 = c("A1", "G2", "A1", "G2", NA, "D1", "D2", NA ),
stringsAsFactors = FALSE)
Remove duplicate combinations in R
df[!duplicated(t(apply(df[c("a", "b")], 1, sort))), ]
a b c
1 1 4 A
2 2 3 B
3 1 5 C
Where:
df <- data.frame(
a = c(1L, 2L, 1L, 4L, 5L, 3L, 3L),
b = c(4L, 3L, 5L, 1L, 1L, 2L, 2L),
c = c("A", "B", "C", "A", "C", "B", "E")
)
How to find duplicated combination were order does not matter in excel
For exact 4 columns and up to 1000 rows:
{=IF(SUM(IF(MMULT({1,1,1,1},TRANSPOSE(COUNTIF($A1:$D1,$A$1:$D$1000)))=4,1))>1,"duplicate","unique")}
This is an array formula. Input it into E1
without the curly brackets. Then press [Ctrl]+[Shift]+[Enter] to confirm.
Copy downwards as needed.
If it not works, please check the language version of your Excel and the locale of your Windows. Maybe the array constant {1,1,1,1}
in my formula must be written as {1\1\1\1}
or {1.1.1.1}
because the comma will be in conflict with the decimal separator or list delimiter.
Remove duplicates across columns
We can sort
the elements in each row
with apply
, t
ranspose the output, apply duplicated
to return a logical vector and use that for subsetting the rows
df[!duplicated(t(apply(df[, 1:2], 1, sort))),]
# [,1] [,2]
#[1,] "a" "b"
#[2,] "a" "c"
#[3,] "a" "d"
#[4,] "b" "c"
#[5,] "b" "d"
#[6,] "c" "d"
or another option is pmin/pmax
df[!duplicated(cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))),]
data
df <- structure(c("a", "a", "a", "b", "b", "b", "c", "c", "c", "b",
"c", "d", "a", "c", "d", "a", "b", "d"), .Dim = c(9L, 2L))
SQL Remove duplicate combination
If you have other columns and the pairs only appear once (in either direction):
select t.*
from t
where t.x1 <= t.x2
union all
select t.*
from t
where t.x1 > t.x2 and
not exists (select 1 from t t2 where t2.x1 = t.x2 and t2.x2 = t.x1);
Delete duplicated rows with same values but in different column in R
One option would be to use a least/greatest trick, and then remove duplicates:
library(SparkR)
df <- unique(cbind(least(df$A, df$B), greatest(df$A, df$B)))
Here is a base R version of the above:
df <- unique(cbind(ifelse(df$A < df$B, df$A, df$B),
ifelse(df$A >= df$B, df$A, df$B)))
Unique case of finding duplicate values flexibly across columns in R
tidyverse
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE
Created on 2022-01-17 by the reprex package (v2.0.1)
removing duplicates
library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts
Created on 2022-01-17 by the reprex package (v2.0.1)
Remove Duplicates Based on Combined Sets
One idea is to treat each long/lat pair as a string toString(...)
- sort the two long/lat pairs (now strings) per row - then sort the resulting 2-element string vector. Use the sorted vector of strings to check for duplicates
ans <- C[!duplicated(lapply(1:nrow(C), function(i) sort(c(toString(C[i,1:2]), toString(C[i,3:4]))))), ]
# A_Latitude A_Longitude B_Latitude B_Longitude
# 1 48.4459 9.9890 49.0275 8.7539
# 2 48.7000 8.1500 48.4734 9.2270
# 4 49.0275 8.7539 48.9602 9.2058
Here's a breakdown for row 1
toString(C[1,1:2])
# [1] "48.4459, 9.989"
toString(C[1,3:4])
# [1] "49.0275, 8.7539"
sort(c(toString(C[1,1:2]), toString(C[1,3:4])))
# [1] "48.4459, 9.989" "49.0275, 8.7539"
Finding unique combinations irrespective of position
Maybe something like that
indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows
df[indx, ] # selects only the non - duplicates according to that index
# a b c
# 1 1 2 3
# 3 3 1 4
Related Topics
Ggplot2: Setting Geom_Bar Baseline to 1 Instead of Zero
Add X and Y Axis to All Facet_Wrap
How to Sum a Variable by Group
How to Find the Difference in Value in Every Two Consecutive Rows in R
Append Data Frames Together in a for Loop
How to Keep Columns When Grouping/Summarizing
Subtracting Two Columns to Give a New Column in R
How to Remove the Negative Values from a Data Frame in R
Creating a for Loop to Subset Data on R
Remove Total Value for One Column in Powerbi
How to Add a Row to Data Frame Based on a Condition
How to Give Subtitles for Subplot in Plot_Ly Using R
How to Generate the First N Terms in the Series:
Column Name Changes in R for Loop for Defined Data Frame
How to Append a Sequential Number for Every Element in a Data Frame