R, Find Duplicated Rows , Regardless of Order

R, find duplicated rows , regardless of order

Perhaps something like this would work for you. It is not clear what your desired output is though.

x <- structure(c("a", "#", "0", "I am", "#", "a", "I am", "0", "3", 
                 "3", "2", "2"), .Dim = c(4L, 3L))
x
#      [,1]   [,2]   [,3]
# [1,] "a"    "#"    "3" 
# [2,] "#"    "a"    "3" 
# [3,] "0"    "I am" "2" 
# [4,] "I am" "0"    "2" 

duplicated(
  lapply(1:nrow(x), function(y){
    A <- x[y, ]
    A[order(A)]
  }))
# [1] FALSE  TRUE FALSE  TRUE

This basically splits the matrix up by row, then sorts each row. duplicated works on lists too, so you just wrap the whole thing with `duplicated to find which items (rows) are duplicated.

Removing duplicate combinations (irrespective of order)

Sort within the rows first, then use duplicated, see below:

# example data    
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items

dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
#       [,1] [,2] [,3]
#  [1,]    1    2    3
#  [2,]    1    2    4
#  [3,]    1    2    5
#  [4,]    1    3    4
#  [5,]    1    3    5
#  [6,]    1    4    5
#  [7,]    2    3    4
#  [8,]    2    3    5
#  [9,]    2    4    5
# [10,]    3    4    5

Find unique pairs of words ignoring their order in two columns in R

dat[!duplicated(t(apply(dat, 1, sort))),]

Using apply and sort will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated. Because duplicated returns a boolean we then subset all rows in dat where duplicated = FALSE.

Unique rows, considering two columns, in R, without order

There are lot's of ways to do this, here is one:

unique(t(apply(df, 1, sort)))
duplicated(t(apply(df, 1, sort)))

One gives the unique rows, the other gives the mask.

R, Find Duplicated Rows , Regardless of Order