Removing Duplicate Combinations (Irrespective of Order)

Removing duplicate combinations (irrespective of order)

Sort within the rows first, then use duplicated, see below:

# example data    
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items

dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Removing duplicate all-way-combinations while retaining all columns

Here's a base solution, using the complete.cases function, and also creating a sorted feedID column:

# remove any rows with NA values
test <- test[complete.cases(test[,c('ID', 'feedID','feedID2')]),]
#remove any rows with feedID == feedID2
test <- test[!(test$feedID == test$feedID2),]
# add new feedID3 column
test$feedID3 <- apply(test, 1, function(x) paste(sort(c(x[2], x[3])), collapse = '-'))
# remove any duplicates, and remove last column
test[!duplicated(test[,c('feedID3', 'ID')]), -4]


ID feedID feedID2
2 49V A1 G2
6 52V B1 D1
7 52V D1 D2

data

Note that we have converted "NA" to NA, and we have also set stringsAsFactors = TRUE

test <- data.frame(ID= c("49V", "49V","49V", "49V", "49V", "52V", "52V", "52V"),
feedID = c("A1", "A1", "G2", "A1", "G2", "B1", "D1", "D2" ),
feedID2 = c("A1", "G2", "A1", "G2", NA, "D1", "D2", NA ),
stringsAsFactors = FALSE)

Remove duplicate combinations in R

df[!duplicated(t(apply(df[c("a", "b")], 1, sort))), ]
a b c
1 1 4 A
2 2 3 B
3 1 5 C

Where:

df <- data.frame(
a = c(1L, 2L, 1L, 4L, 5L, 3L, 3L),
b = c(4L, 3L, 5L, 1L, 1L, 2L, 2L),
c = c("A", "B", "C", "A", "C", "B", "E")
)

How to find duplicated combination were order does not matter in excel

For exact 4 columns and up to 1000 rows:

{=IF(SUM(IF(MMULT({1,1,1,1},TRANSPOSE(COUNTIF($A1:$D1,$A$1:$D$1000)))=4,1))>1,"duplicate","unique")}

This is an array formula. Input it into E1 without the curly brackets. Then press [Ctrl]+[Shift]+[Enter] to confirm.

Copy downwards as needed.

If it not works, please check the language version of your Excel and the locale of your Windows. Maybe the array constant {1,1,1,1} in my formula must be written as {1\1\1\1} or {1.1.1.1} because the comma will be in conflict with the decimal separator or list delimiter.

Remove duplicates across columns

We can sort the elements in each row with apply, transpose the output, apply duplicated to return a logical vector and use that for subsetting the rows

df[!duplicated(t(apply(df[, 1:2], 1, sort))),]
# [,1] [,2]
#[1,] "a" "b"
#[2,] "a" "c"
#[3,] "a" "d"
#[4,] "b" "c"
#[5,] "b" "d"
#[6,] "c" "d"

or another option is pmin/pmax

df[!duplicated(cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))),]

data

df <- structure(c("a", "a", "a", "b", "b", "b", "c", "c", "c", "b", 
"c", "d", "a", "c", "d", "a", "b", "d"), .Dim = c(9L, 2L))

SQL Remove duplicate combination

If you have other columns and the pairs only appear once (in either direction):

select t.*
from t
where t.x1 <= t.x2
union all
select t.*
from t
where t.x1 > t.x2 and
not exists (select 1 from t t2 where t2.x1 = t.x2 and t2.x2 = t.x1);

Delete duplicated rows with same values but in different column in R

One option would be to use a least/greatest trick, and then remove duplicates:

library(SparkR)

df <- unique(cbind(least(df$A, df$B), greatest(df$A, df$B)))

Here is a base R version of the above:

df <- unique(cbind(ifelse(df$A < df$B, df$A, df$B),
ifelse(df$A >= df$B, df$A, df$B)))

Unique case of finding duplicate values flexibly across columns in R

tidyverse

df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)

df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE

Created on 2022-01-17 by the reprex package (v2.0.1)

removing duplicates


library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts

Created on 2022-01-17 by the reprex package (v2.0.1)

Remove Duplicates Based on Combined Sets

One idea is to treat each long/lat pair as a string toString(...) - sort the two long/lat pairs (now strings) per row - then sort the resulting 2-element string vector. Use the sorted vector of strings to check for duplicates

ans <- C[!duplicated(lapply(1:nrow(C), function(i) sort(c(toString(C[i,1:2]), toString(C[i,3:4]))))), ]
# A_Latitude A_Longitude B_Latitude B_Longitude
# 1 48.4459 9.9890 49.0275 8.7539
# 2 48.7000 8.1500 48.4734 9.2270
# 4 49.0275 8.7539 48.9602 9.2058

Here's a breakdown for row 1

toString(C[1,1:2])
# [1] "48.4459, 9.989"
toString(C[1,3:4])
# [1] "49.0275, 8.7539"
sort(c(toString(C[1,1:2]), toString(C[1,3:4])))
# [1] "48.4459, 9.989" "49.0275, 8.7539"

Finding unique combinations irrespective of position

Maybe something like that

indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows
df[indx, ] # selects only the non - duplicates according to that index
# a b c
# 1 1 2 3
# 3 3 1 4


Related Topics



Leave a reply



Submit