Generating Non-Duplicate Combination Pairs in R

Generating non-duplicate combination pairs in R

> x<-c('1','2','3','4')
> combn(x,2)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1"  "1"  "1"  "2"  "2"  "3" 
[2,] "2"  "3"  "4"  "3"  "4"  "4"

Generating unique pairs from R vector

combn(v, 2)
#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,]    2    2    2    3    3    4
#[2,]    3    4    5    4    5    5

or combn(unique(v), 2) if necessary.

Combination without repetition in R

Try something like:

x <- c("a","b","c","d","e")
d1 <- combn(x,3) # All combinations

d1 

#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "a"  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "c"  
# [2,] "b"  "b"  "b"  "c"  "c"  "d"  "c"  "c"  "d"  "d"  
# [3,] "c"  "d"  "e"  "d"  "e"  "e"  "d"  "e"  "e"  "e"

nrow(unique(t(d1))) == nrow(t(d1))
# [1] TRUE

d2 <- expand.grid(x,x,x) # All permutations 

d2

#     Var1 Var2 Var3
# 1      a    a    a
# 2      b    a    a
# 3      c    a    a
# 4      d    a    a
# 5      e    a    a
# 6      a    b    a
# 7      b    b    a
# 8      c    b    a
# 9      d    b    a
# ...

nrow(unique(d2)) == nrow(d2)
# [1] TRUE

Catesian product without duplicate pairs in R

For cartesian joins with merge pass NULL into by argument:

merge(SaleItems, SaleItems2, by=NULL)

Then to remove equivalent matches and reverse duplicates, extend it with subset:

subset(merge(SaleItems, SaleItems2, by=NULL),
       Appliance <= Appliance2)

And if fields are factors:

subset(merge(SaleItems, SaleItems2, by=NULL),
       as.character(Appliance) <= as.character(Appliance2))

#    Appliance Appliance2
# 1      Radio      Radio
# 2     Laptop      Radio
# 4     Fridge      Radio
# 6     Laptop     Laptop
# 8     Fridge     Laptop
# 9      Radio         TV
# 10    Laptop         TV
# 11        TV         TV
# 12    Fridge         TV
# 16    Fridge     Fridge

Unique combinations of two vectors without pair repetition

We can sort by row using apply and get the logical index using duplicated to remove the duplicate rows.

 df1[!duplicated(t(apply(df1, 1, sort))),]
 #   x1 x2
 #1  1  2
 #2  1  3
 #3  1  1
 #4  2  4
 #6  2  2
 #7  3  4
 #8  3  2

data

df1 <- structure(list(x1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L), x2 = c(2L, 3L, 1L, 4L, 1L, 2L, 4L, 2L, 1L, 2L, 2L)), .Names = c("x1", 
"x2"), class = "data.frame", row.names = c(NA, -11L))

Generating Random Pairs of Integers without Replacement in R

The key here is not to generate all the permutations as that is very expensive memory and time wise. Since you only care about two numbers, we can do this very easily so long as the (number_of_possible_values) ^ 2 is less than the largest representable integer in double precision floating point:

size <- 1e5
samples <- 100
vals <- sample.int(size ^ 2, samples)
cbind(vals %/% size + 1, vals %% size)

Basically, we use integers to represent every possible combination of values. In our example, we sample from all the numbers up to 1e5 ^ 2, since we have 1e5 ^ 2 possible combinations of 1e5 numbers. Each of those 1e10 integers represents one of the combinations. We then decompose that integer into the two component values by taking the modulo, as the first number, and the integer division as the second.

Benchmarks:

Unit: microseconds
                   expr        min         lq       mean
  funBrodie(10000, 100)     16.457     17.188     22.052
 funRichard(10000, 100) 542513.717 640647.919 638045.215

Also, limit should be ~3x1e7, and remains relatively fast:

Unit: microseconds
                  expr    min      lq     mean median      uq    max neval
 funBrodie(1e+07, 100) 18.285 20.6625 22.88209 21.211 22.4905 77.893   100

Functions for benchmarking:

funRichard <- function(size, samples) {
  nums <- 1:size
  dt = CJ(nums, nums)
  dt[sample(1:dim(dt)[1], size = samples), ]
}
funBrodie <- function(size, samples) {
  vals <- sample.int(size ^ 2, samples)
  cbind(vals %/% size + 1, vals %% size)
}

And confirm we're doing similar things (note it's not a given these should be exactly the same, but it turns out they are):

set.seed(1)
resB <- funBrodie(1e4, 100)
set.seed(1)
resR <- unname(as.matrix(funRichard(1e4, 100)))
all.equal(resB, resR)
# TRUE

Non-redundant version of expand.grid

How about using outer? But this particular function concatenates them into one character string.

outer( c("aa", "ab", "cc"), c("aa", "ab", "cc") , "paste" )
#     [,1]    [,2]    [,3]   
#[1,] "aa aa" "aa ab" "aa cc"
#[2,] "ab aa" "ab ab" "ab cc"
#[3,] "cc aa" "cc ab" "cc cc"

You can also use combn on the unique elements of the two vectors if you don't want the repeating elements (e.g. aa aa)

vals <- c( c("aa", "ab", "cc"), c("aa", "ab", "cc") )
vals <- unique( vals )
combn( vals , 2 )
#     [,1] [,2] [,3]
#[1,] "aa" "aa" "ab"
#[2,] "ab" "cc" "cc"

Generate all possible n choose 2 pairs from a vector in R, efficient and fast

As pointed out by @Arun, you can use combn

> t(combn(x, 2))
     [,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4

Generating Non-Duplicate Combination Pairs in R