Generating non-duplicate combination pairs in R
> x<-c('1','2','3','4')
> combn(x,2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1" "1" "1" "2" "2" "3"
[2,] "2" "3" "4" "3" "4" "4"
Generating unique pairs from R vector
combn(v, 2)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 2 2 2 3 3 4
#[2,] 3 4 5 4 5 5
or combn(unique(v), 2)
if necessary.
Combination without repetition in R
Try something like:
x <- c("a","b","c","d","e")
d1 <- combn(x,3) # All combinations
d1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "a" "a" "a" "a" "a" "a" "b" "b" "b" "c"
# [2,] "b" "b" "b" "c" "c" "d" "c" "c" "d" "d"
# [3,] "c" "d" "e" "d" "e" "e" "d" "e" "e" "e"
nrow(unique(t(d1))) == nrow(t(d1))
# [1] TRUE
d2 <- expand.grid(x,x,x) # All permutations
d2
# Var1 Var2 Var3
# 1 a a a
# 2 b a a
# 3 c a a
# 4 d a a
# 5 e a a
# 6 a b a
# 7 b b a
# 8 c b a
# 9 d b a
# ...
nrow(unique(d2)) == nrow(d2)
# [1] TRUE
Catesian product without duplicate pairs in R
For cartesian joins with merge
pass NULL into by argument:
merge(SaleItems, SaleItems2, by=NULL)
Then to remove equivalent matches and reverse duplicates, extend it with subset
:
subset(merge(SaleItems, SaleItems2, by=NULL),
Appliance <= Appliance2)
And if fields are factors:
subset(merge(SaleItems, SaleItems2, by=NULL),
as.character(Appliance) <= as.character(Appliance2))
# Appliance Appliance2
# 1 Radio Radio
# 2 Laptop Radio
# 4 Fridge Radio
# 6 Laptop Laptop
# 8 Fridge Laptop
# 9 Radio TV
# 10 Laptop TV
# 11 TV TV
# 12 Fridge TV
# 16 Fridge Fridge
Unique combinations of two vectors without pair repetition
We can sort
by row using apply
and get the logical index using duplicated
to remove the duplicate rows.
df1[!duplicated(t(apply(df1, 1, sort))),]
# x1 x2
#1 1 2
#2 1 3
#3 1 1
#4 2 4
#6 2 2
#7 3 4
#8 3 2
data
df1 <- structure(list(x1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L), x2 = c(2L, 3L, 1L, 4L, 1L, 2L, 4L, 2L, 1L, 2L, 2L)), .Names = c("x1",
"x2"), class = "data.frame", row.names = c(NA, -11L))
Generating Random Pairs of Integers without Replacement in R
The key here is not to generate all the permutations as that is very expensive memory and time wise. Since you only care about two numbers, we can do this very easily so long as the (number_of_possible_values) ^ 2
is less than the largest representable integer in double precision floating point:
size <- 1e5
samples <- 100
vals <- sample.int(size ^ 2, samples)
cbind(vals %/% size + 1, vals %% size)
Basically, we use integers to represent every possible combination of values. In our example, we sample from all the numbers up to 1e5 ^ 2
, since we have 1e5 ^ 2
possible combinations of 1e5
numbers. Each of those 1e10
integers represents one of the combinations. We then decompose that integer into the two component values by taking the modulo, as the first number, and the integer division as the second.
Benchmarks:
Unit: microseconds
expr min lq mean
funBrodie(10000, 100) 16.457 17.188 22.052
funRichard(10000, 100) 542513.717 640647.919 638045.215
Also, limit should be ~3x1e7, and remains relatively fast:
Unit: microseconds
expr min lq mean median uq max neval
funBrodie(1e+07, 100) 18.285 20.6625 22.88209 21.211 22.4905 77.893 100
Functions for benchmarking:
funRichard <- function(size, samples) {
nums <- 1:size
dt = CJ(nums, nums)
dt[sample(1:dim(dt)[1], size = samples), ]
}
funBrodie <- function(size, samples) {
vals <- sample.int(size ^ 2, samples)
cbind(vals %/% size + 1, vals %% size)
}
And confirm we're doing similar things (note it's not a given these should be exactly the same, but it turns out they are):
set.seed(1)
resB <- funBrodie(1e4, 100)
set.seed(1)
resR <- unname(as.matrix(funRichard(1e4, 100)))
all.equal(resB, resR)
# TRUE
Non-redundant version of expand.grid
How about using outer
? But this particular function concatenates them into one character string.
outer( c("aa", "ab", "cc"), c("aa", "ab", "cc") , "paste" )
# [,1] [,2] [,3]
#[1,] "aa aa" "aa ab" "aa cc"
#[2,] "ab aa" "ab ab" "ab cc"
#[3,] "cc aa" "cc ab" "cc cc"
You can also use combn
on the unique elements of the two vectors if you don't want the repeating elements (e.g. aa aa
)
vals <- c( c("aa", "ab", "cc"), c("aa", "ab", "cc") )
vals <- unique( vals )
combn( vals , 2 )
# [,1] [,2] [,3]
#[1,] "aa" "aa" "ab"
#[2,] "ab" "cc" "cc"
Generate all possible n choose 2 pairs from a vector in R, efficient and fast
As pointed out by @Arun, you can use combn
> t(combn(x, 2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 3
[5,] 2 4
[6,] 3 4
Related Topics
How to Apply Geom_Smooth() for Every Group
Predicting Probabilities for Gbm with Caret Library
Extract Columns from Data Table by Numeric Indices Stored in a Vector
How to Write Special Characters in Rmarkdown Latex Documents
Difference Between [] and $ Operators for Subsetting
R - Delete Consecutive (Only) Duplicates
How to Convert a Character String Date to Date Class If Day Value Is Missing
Avoid Ggplot2 to Partially Cut Axis Text
Ggplot2: More Complex Faceting
Different Colors with Gradient for Subgroups on a Treemap Ggplot2 R
Remove Numbers at the Beginning and End of a String
Filled.Contour in R 3.0.X Throws Error
R: How to Find What S3 Method Will Be Called on an Object
Str_Extract_All: Return All Patterns Found in String Concatenated as Vector
How to Add My Outlook Email Signature to the Com Object Using Rdcomclient