## Removing duplicate combinations (irrespective of order)

*Sort* within the rows first, then use *duplicated*, see below:

`# example data `

dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)

# Read 90 items

dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]

# [,1] [,2] [,3]

# [1,] 1 2 3

# [2,] 1 2 4

# [3,] 1 2 5

# [4,] 1 3 4

# [5,] 1 3 5

# [6,] 1 4 5

# [7,] 2 3 4

# [8,] 2 3 5

# [9,] 2 4 5

# [10,] 3 4 5

## Removing duplicate all-way-combinations while retaining all columns

Here's a base solution, using the `complete.cases`

function, and also creating a sorted `feedID`

column:

`# remove any rows with NA values`

test <- test[complete.cases(test[,c('ID', 'feedID','feedID2')]),]

#remove any rows with feedID == feedID2

test <- test[!(test$feedID == test$feedID2),]

# add new feedID3 column

test$feedID3 <- apply(test, 1, function(x) paste(sort(c(x[2], x[3])), collapse = '-'))

# remove any duplicates, and remove last column

test[!duplicated(test[,c('feedID3', 'ID')]), -4]

ID feedID feedID2

2 49V A1 G2

6 52V B1 D1

7 52V D1 D2

### data

Note that we have converted `"NA"`

to `NA`

, and we have also set `stringsAsFactors = TRUE`

`test <- data.frame(ID= c("49V", "49V","49V", "49V", "49V", "52V", "52V", "52V"),`

feedID = c("A1", "A1", "G2", "A1", "G2", "B1", "D1", "D2" ),

feedID2 = c("A1", "G2", "A1", "G2", NA, "D1", "D2", NA ),

stringsAsFactors = FALSE)

## Remove duplicate combinations in R

`df[!duplicated(t(apply(df[c("a", "b")], 1, sort))), ]`

a b c

1 1 4 A

2 2 3 B

3 1 5 C

Where:

`df <- data.frame(`

a = c(1L, 2L, 1L, 4L, 5L, 3L, 3L),

b = c(4L, 3L, 5L, 1L, 1L, 2L, 2L),

c = c("A", "B", "C", "A", "C", "B", "E")

)

## How to find duplicated combination were order does not matter in excel

For exact 4 columns and up to 1000 rows:

`{=IF(SUM(IF(MMULT({1,1,1,1},TRANSPOSE(COUNTIF($A1:$D1,$A$1:$D$1000)))=4,1))>1,"duplicate","unique")}`

This is an array formula. Input it into `E1`

without the curly brackets. Then press [Ctrl]+[Shift]+[Enter] to confirm.

Copy downwards as needed.

If it not works, please check the language version of your Excel and the locale of your Windows. Maybe the array constant `{1,1,1,1}`

in my formula must be written as `{1\1\1\1}`

or `{1.1.1.1}`

because the comma will be in conflict with the decimal separator or list delimiter.

## Remove duplicates across columns

We can `sort`

the elements in each `row`

with `apply`

, `t`

ranspose the output, apply `duplicated`

to return a logical vector and use that for subsetting the rows

`df[!duplicated(t(apply(df[, 1:2], 1, sort))),]`

# [,1] [,2]

#[1,] "a" "b"

#[2,] "a" "c"

#[3,] "a" "d"

#[4,] "b" "c"

#[5,] "b" "d"

#[6,] "c" "d"

or another option is `pmin/pmax`

`df[!duplicated(cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))),]`

### data

`df <- structure(c("a", "a", "a", "b", "b", "b", "c", "c", "c", "b", `

"c", "d", "a", "c", "d", "a", "b", "d"), .Dim = c(9L, 2L))

## SQL Remove duplicate combination

If you have other columns and the pairs only appear once (in either direction):

`select t.*`

from t

where t.x1 <= t.x2

union all

select t.*

from t

where t.x1 > t.x2 and

not exists (select 1 from t t2 where t2.x1 = t.x2 and t2.x2 = t.x1);

## Delete duplicated rows with same values but in different column in R

One option would be to use a least/greatest trick, and then remove duplicates:

`library(SparkR)`

df <- unique(cbind(least(df$A, df$B), greatest(df$A, df$B)))

Here is a base R version of the above:

`df <- unique(cbind(ifelse(df$A < df$B, df$A, df$B),`

ifelse(df$A >= df$B, df$A, df$B)))

## Unique case of finding duplicate values flexibly across columns in R

**tidyverse**

`df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),`

predation_type = c("eats", "eats", "eaten by", "eats"),

animal_2 = c("mouse", "squirrel", "cat", "nuts"))

library(tidyverse)

df %>%

rowwise() %>%

mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%

group_by(duplicates) %>%

mutate(duplicates = n() > 1) %>%

ungroup()

#> # A tibble: 4 x 4

#> animal_1 predation_type animal_2 duplicates

#> <chr> <chr> <chr> <lgl>

#> 1 cat eats mouse TRUE

#> 2 dog eats squirrel FALSE

#> 3 mouse eaten by cat TRUE

#> 4 squirrel eats nuts FALSE

^{Created on 2022-01-17 by the reprex package (v2.0.1)}

removing duplicates

library(tidyverse)

df %>%

filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))

#> animal_1 predation_type animal_2

#> 1 cat eats mouse

#> 2 dog eats squirrel

#> 3 squirrel eats nuts

^{Created on 2022-01-17 by the reprex package (v2.0.1)}

## Remove Duplicates Based on Combined Sets

One idea is to treat each long/lat pair as a string `toString(...)`

- sort the two long/lat pairs (now strings) per row - then sort the resulting 2-element string vector. Use the sorted vector of strings to check for duplicates

`ans <- C[!duplicated(lapply(1:nrow(C), function(i) sort(c(toString(C[i,1:2]), toString(C[i,3:4]))))), ]`

# A_Latitude A_Longitude B_Latitude B_Longitude

# 1 48.4459 9.9890 49.0275 8.7539

# 2 48.7000 8.1500 48.4734 9.2270

# 4 49.0275 8.7539 48.9602 9.2058

Here's a breakdown for row 1

`toString(C[1,1:2])`

# [1] "48.4459, 9.989"

toString(C[1,3:4])

# [1] "49.0275, 8.7539"

sort(c(toString(C[1,1:2]), toString(C[1,3:4])))

# [1] "48.4459, 9.989" "49.0275, 8.7539"

## Finding unique combinations irrespective of position

Maybe something like that

`indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows`

df[indx, ] # selects only the non - duplicates according to that index

# a b c

# 1 1 2 3

# 3 3 1 4

### Related Topics

Ggplot2: Setting Geom_Bar Baseline to 1 Instead of Zero

Add X and Y Axis to All Facet_Wrap

How to Sum a Variable by Group

How to Find the Difference in Value in Every Two Consecutive Rows in R

Append Data Frames Together in a for Loop

How to Keep Columns When Grouping/Summarizing

Subtracting Two Columns to Give a New Column in R

How to Remove the Negative Values from a Data Frame in R

Creating a for Loop to Subset Data on R

Remove Total Value for One Column in Powerbi

How to Add a Row to Data Frame Based on a Condition

How to Give Subtitles for Subplot in Plot_Ly Using R

How to Generate the First N Terms in the Series:

Column Name Changes in R for Loop for Defined Data Frame

How to Append a Sequential Number for Every Element in a Data Frame