R Expand.Grid() Function in Python

R expand.grid() function in Python

Here's an example that gives output similar to what you need:

import itertools
def expandgrid(*itrs):
   product = list(itertools.product(*itrs))
   return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}

The difference is related to the fact that in itertools.product the rightmost element advances on every iteration. You can tweak the function by sorting the product list smartly if it's important.

EDIT (by S. Laurent)

To have the same as R:

def expandgrid(*itrs): # https://stackoverflow.com/a/12131385/1100107
    """
    Cartesian product. Reversion is for compatibility with R.
    
    """
    product = list(itertools.product(*reversed(itrs)))
    return [[x[i] for x in product] for i in range(len(itrs))][::-1]

expand.grid equivalent to get pandas data frame for prediction in Python

In pandas we have MultiIndex

d =  {'a': [1, 2, 3], 'b': [4, 5]}
out = pd.MultiIndex.from_product(d.values(),names=d.keys()).to_frame().reset_index(drop=True)
Out[58]: 
   a  b
0  1  4
1  1  5
2  2  4
3  2  5
4  3  4
5  3  5

or simple with itertools

import itertools
out = pd.DataFrame(itertools.product(*d.values()),columns=d.keys())
Out[62]: 
   a  b
0  1  4
1  1  5
2  2  4
3  2  5
4  3  4
5  3  5

Using Expand.Grid and mapply to evaluate functions

I think there is a bug in the grid_function. It does generate an error when I try to trigger it manually:

> unlist(DF_1[1,])
random_1 random_2 random_3 random_4  split_1  split_2  split_3 
      80       85       85       90        0        0        0 
> grid_function(80,85,85,90,0,0,0)
`summarise()` ungrouping output (override with `.groups` argument)
Error in grid_function(80, 85, 85, 90, 0, 0, 0) : 
  object 'i' not found

Or maybe I am using it wrong. I made my own simple grid function for testing purposes. It generates the median.

grid_function2 <- function(random_1 , random_2, random_3, random_4, split_1, split_2, split_3){
 return(median(c(random_1 , random_2, random_3, random_4, split_1, split_2, split_3))) 
}

In order to apply the function over the grid, you should define the parameter names as colnames:

# DF_1 is the expanded grid from your question
colnames(DF_1) <- c("random_1" , "random_2", "random_3",
                    "random_4", "split_1", "split_2", "split_3")

Then you can apply a function with named arguments over it:

resultdf1 <- apply(DF_1,1, # 1 means rows
                   FUN=function(x){
                     do.call(
                       # Call Function grid_function2 with the arguments in
                       # a list
                       grid_function2,
                       # force list type for the arguments
                       as.list(
                         # make the row to a named vector
                         unlist(x)
                         )
                       )
                     }
                   )

How to speed up `expand.grid()` in R?

You may try data.table::CJ function.

bench::mark(base = expand.grid(year, names),
            jc = expand.grid.jc(year, names),
            tidyr1 = tidyr::expand_grid(year, names), 
            tidyr2 = tidyr::crossing(year, names), 
            dt = data.table::CJ(year, names),
            check = FALSE, iterations = 10)

#  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory  time   gc   
#  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>  <list> <lis>
#1 base       635.48ms 715.02ms     1.25      699MB    2.00     10    16      8.02s <NULL> <Rprof… <benc… <tib…
#2 jc            5.66s    5.76s     0.172     820MB    0.275    10    16     58.13s <NULL> <Rprof… <benc… <tib…
#3 tidyr1     195.03ms 268.97ms     4.01      308MB    2.00     10     5       2.5s <NULL> <Rprof… <benc… <tib…
#4 tidyr2     590.91ms 748.35ms     1.31      312MB    0.656    10     5      7.62s <NULL> <Rprof… <benc… <tib…
#5 dt          318.1ms 384.21ms     2.47      206MB    0.986    10     4      4.06s <NULL> <Rprof… <benc… <tib…

PS - Also included tidyr::crossing for comparison as it does the same thing.

Combinations using expand.grid in vector

If we need to create a single string for combinations from 2 to the length of 'myvec', use combn

grid <- data.frame(Var1 = unlist(lapply(2:length(myvec), \(i) 
     combn(myvec, i, FUN = paste, collapse = "_"))))

-output

> head(grid)
     Var1
1   B2_B3
2   B2_B4
3   B2_B8
4 B2_NDVI
5 B2_SAVI
6 B2_SIPI
> tail(grid)
                                                              Var1
32747      B2_B3_B4_B8_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32748    B2_B3_B4_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32749    B2_B3_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32750    B2_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32751    B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32752 B2_B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI

How to use expand.grid with conditions?

Here is another base R way. It uses a logical index to modify columns d and e, the rest of the code is like in the question. The tests below show it's the fastest alternative.

f1 <- function(a, b, c, d, e){
  X <- expand.grid(a, b, c, d, e)
  names(X) <- c("a","b","c","d","e")
  X$d <- ifelse(X$c == 0, X$d[1], X$d)
  X$e <- ifelse(X$c == 0, X$d[1], X$e)
  unique(X)
}

f2 <- function(a, b, c, d, e){
  X <- expand.grid(a, b, c, d, e)
  names(X) <- c("a","b","c","d","e")
  i <- X$c == 0
  X$d[i] <- X$d[1]
  X$e[i] <- X$e[1]
  unique(X)
}

library(tidyr)
library(dplyr)

f3 <- function(a, b, c, d, e){
  crossing(a, b, c, d, e) %>% 
    mutate_at(vars(d, e), ~ replace(., c == 0, first(.))) %>%
    distinct
}

a = 1:5
b = 1:5
c = 0:3
d = 1:5
e = 1:3

library(microbenchmark)

mb <- microbenchmark(
  op = f1(a,b,c,d,e),
  rui = f2(a,b,c,d,e),
  akrun = f3(a,b,c,d,e)
)

print(mb, unit = "relative", order = "median")
#Unit: relative
#  expr       min       lq     mean   median       uq      max neval cld
#   rui 1.0000000 1.000000 1.000000 1.000000 1.000000 1.000000   100  a 
#    op 0.8147996 1.035322 1.018649 1.026295 1.038269 1.096384   100  a 
# akrun 1.7580304 1.815582 1.836061 1.827887 1.872767 1.107545   100   b

A more generalized expand.grid function?

Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :

slice.expand <- function(..., dimension = 1) {
    L <- lapply(list(...), seq_along)
        n <- length(L)
    ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
    expand.grid(...)[ix, ]
}

# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

R Expand.Grid() Function in Python