R Expand.Grid() Function in Python

R expand.grid() function in Python

Here's an example that gives output similar to what you need:

import itertools
def expandgrid(*itrs):
product = list(itertools.product(*itrs))
return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}

The difference is related to the fact that in itertools.product the rightmost element advances on every iteration. You can tweak the function by sorting the product list smartly if it's important.



EDIT (by S. Laurent)

To have the same as R:

def expandgrid(*itrs): # https://stackoverflow.com/a/12131385/1100107
"""
Cartesian product. Reversion is for compatibility with R.

"""
product = list(itertools.product(*reversed(itrs)))
return [[x[i] for x in product] for i in range(len(itrs))][::-1]

expand.grid equivalent to get pandas data frame for prediction in Python

In pandas we have MultiIndex

d =  {'a': [1, 2, 3], 'b': [4, 5]}
out = pd.MultiIndex.from_product(d.values(),names=d.keys()).to_frame().reset_index(drop=True)
Out[58]:
a b
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5

or simple with itertools

import itertools
out = pd.DataFrame(itertools.product(*d.values()),columns=d.keys())
Out[62]:
a b
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5

Using Expand.Grid and mapply to evaluate functions

I think there is a bug in the grid_function. It does generate an error when I try to trigger it manually:

> unlist(DF_1[1,])
random_1 random_2 random_3 random_4 split_1 split_2 split_3
80 85 85 90 0 0 0
> grid_function(80,85,85,90,0,0,0)
`summarise()` ungrouping output (override with `.groups` argument)
Error in grid_function(80, 85, 85, 90, 0, 0, 0) :
object 'i' not found

Or maybe I am using it wrong. I made my own simple grid function for testing purposes. It generates the median.

grid_function2 <- function(random_1 , random_2, random_3, random_4, split_1, split_2, split_3){
return(median(c(random_1 , random_2, random_3, random_4, split_1, split_2, split_3)))
}

In order to apply the function over the grid, you should define the parameter names as colnames:

# DF_1 is the expanded grid from your question
colnames(DF_1) <- c("random_1" , "random_2", "random_3",
"random_4", "split_1", "split_2", "split_3")

Then you can apply a function with named arguments over it:

resultdf1 <- apply(DF_1,1, # 1 means rows
FUN=function(x){
do.call(
# Call Function grid_function2 with the arguments in
# a list
grid_function2,
# force list type for the arguments
as.list(
# make the row to a named vector
unlist(x)
)
)
}
)

How to speed up `expand.grid()` in R?

You may try data.table::CJ function.

bench::mark(base = expand.grid(year, names),
jc = expand.grid.jc(year, names),
tidyr1 = tidyr::expand_grid(year, names),
tidyr2 = tidyr::crossing(year, names),
dt = data.table::CJ(year, names),
check = FALSE, iterations = 10)

# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <lis>
#1 base 635.48ms 715.02ms 1.25 699MB 2.00 10 16 8.02s <NULL> <Rprof… <benc… <tib…
#2 jc 5.66s 5.76s 0.172 820MB 0.275 10 16 58.13s <NULL> <Rprof… <benc… <tib…
#3 tidyr1 195.03ms 268.97ms 4.01 308MB 2.00 10 5 2.5s <NULL> <Rprof… <benc… <tib…
#4 tidyr2 590.91ms 748.35ms 1.31 312MB 0.656 10 5 7.62s <NULL> <Rprof… <benc… <tib…
#5 dt 318.1ms 384.21ms 2.47 206MB 0.986 10 4 4.06s <NULL> <Rprof… <benc… <tib…

PS - Also included tidyr::crossing for comparison as it does the same thing.

Combinations using expand.grid in vector

If we need to create a single string for combinations from 2 to the length of 'myvec', use combn

grid <- data.frame(Var1 = unlist(lapply(2:length(myvec), \(i) 
combn(myvec, i, FUN = paste, collapse = "_"))))

-output

> head(grid)
Var1
1 B2_B3
2 B2_B4
3 B2_B8
4 B2_NDVI
5 B2_SAVI
6 B2_SIPI
> tail(grid)
Var1
32747 B2_B3_B4_B8_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32748 B2_B3_B4_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32749 B2_B3_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32750 B2_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32751 B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32752 B2_B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI

How to use expand.grid with conditions?

Here is another base R way. It uses a logical index to modify columns d and e, the rest of the code is like in the question. The tests below show it's the fastest alternative.

f1 <- function(a, b, c, d, e){
X <- expand.grid(a, b, c, d, e)
names(X) <- c("a","b","c","d","e")
X$d <- ifelse(X$c == 0, X$d[1], X$d)
X$e <- ifelse(X$c == 0, X$d[1], X$e)
unique(X)
}

f2 <- function(a, b, c, d, e){
X <- expand.grid(a, b, c, d, e)
names(X) <- c("a","b","c","d","e")
i <- X$c == 0
X$d[i] <- X$d[1]
X$e[i] <- X$e[1]
unique(X)
}

library(tidyr)
library(dplyr)

f3 <- function(a, b, c, d, e){
crossing(a, b, c, d, e) %>%
mutate_at(vars(d, e), ~ replace(., c == 0, first(.))) %>%
distinct
}

a = 1:5
b = 1:5
c = 0:3
d = 1:5
e = 1:3

library(microbenchmark)

mb <- microbenchmark(
op = f1(a,b,c,d,e),
rui = f2(a,b,c,d,e),
akrun = f3(a,b,c,d,e)
)

print(mb, unit = "relative", order = "median")
#Unit: relative
# expr min lq mean median uq max neval cld
# rui 1.0000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# op 0.8147996 1.035322 1.018649 1.026295 1.038269 1.096384 100 a
# akrun 1.7580304 1.815582 1.836061 1.827887 1.872767 1.107545 100 b

A more generalized expand.grid function?

Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :

slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}

# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)


Related Topics



Leave a reply



Submit