R, Find Duplicated Rows , Regardless of Order

R, find duplicated rows , regardless of order

Perhaps something like this would work for you. It is not clear what your desired output is though.

x <- structure(c("a", "#", "0", "I am", "#", "a", "I am", "0", "3", 
"3", "2", "2"), .Dim = c(4L, 3L))
x
# [,1] [,2] [,3]
# [1,] "a" "#" "3"
# [2,] "#" "a" "3"
# [3,] "0" "I am" "2"
# [4,] "I am" "0" "2"

duplicated(
lapply(1:nrow(x), function(y){
A <- x[y, ]
A[order(A)]
}))
# [1] FALSE TRUE FALSE TRUE

This basically splits the matrix up by row, then sorts each row. duplicated works on lists too, so you just wrap the whole thing with `duplicated to find which items (rows) are duplicated.

Removing duplicate combinations (irrespective of order)

Sort within the rows first, then use duplicated, see below:

# example data    
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items

dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Find unique pairs of words ignoring their order in two columns in R

dat[!duplicated(t(apply(dat, 1, sort))),]

Using apply and sort will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated. Because duplicated returns a boolean we then subset all rows in dat where duplicated = FALSE.

Unique rows, considering two columns, in R, without order

There are lot's of ways to do this, here is one:

unique(t(apply(df, 1, sort)))
duplicated(t(apply(df, 1, sort)))

One gives the unique rows, the other gives the mask.



Related Topics



Leave a reply



Submit