R expand.grid() function in Python
Here's an example that gives output similar to what you need:
import itertools
def expandgrid(*itrs):
product = list(itertools.product(*itrs))
return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}
The difference is related to the fact that in itertools.product
the rightmost element advances on every iteration. You can tweak the function by sorting the product
list smartly if it's important.
EDIT (by S. Laurent)
To have the same as R:
def expandgrid(*itrs): # https://stackoverflow.com/a/12131385/1100107
"""
Cartesian product. Reversion is for compatibility with R.
"""
product = list(itertools.product(*reversed(itrs)))
return [[x[i] for x in product] for i in range(len(itrs))][::-1]
expand.grid equivalent to get pandas data frame for prediction in Python
In pandas we have MultiIndex
d = {'a': [1, 2, 3], 'b': [4, 5]}
out = pd.MultiIndex.from_product(d.values(),names=d.keys()).to_frame().reset_index(drop=True)
Out[58]:
a b
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5
or simple with itertools
import itertools
out = pd.DataFrame(itertools.product(*d.values()),columns=d.keys())
Out[62]:
a b
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5
Using Expand.Grid and mapply to evaluate functions
I think there is a bug in the grid_function
. It does generate an error when I try to trigger it manually:
> unlist(DF_1[1,])
random_1 random_2 random_3 random_4 split_1 split_2 split_3
80 85 85 90 0 0 0
> grid_function(80,85,85,90,0,0,0)
`summarise()` ungrouping output (override with `.groups` argument)
Error in grid_function(80, 85, 85, 90, 0, 0, 0) :
object 'i' not found
Or maybe I am using it wrong. I made my own simple grid function for testing purposes. It generates the median.
grid_function2 <- function(random_1 , random_2, random_3, random_4, split_1, split_2, split_3){
return(median(c(random_1 , random_2, random_3, random_4, split_1, split_2, split_3)))
}
In order to apply the function over the grid, you should define the parameter names as colnames
:
# DF_1 is the expanded grid from your question
colnames(DF_1) <- c("random_1" , "random_2", "random_3",
"random_4", "split_1", "split_2", "split_3")
Then you can apply a function with named arguments over it:
resultdf1 <- apply(DF_1,1, # 1 means rows
FUN=function(x){
do.call(
# Call Function grid_function2 with the arguments in
# a list
grid_function2,
# force list type for the arguments
as.list(
# make the row to a named vector
unlist(x)
)
)
}
)
How to speed up `expand.grid()` in R?
You may try data.table::CJ
function.
bench::mark(base = expand.grid(year, names),
jc = expand.grid.jc(year, names),
tidyr1 = tidyr::expand_grid(year, names),
tidyr2 = tidyr::crossing(year, names),
dt = data.table::CJ(year, names),
check = FALSE, iterations = 10)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <lis>
#1 base 635.48ms 715.02ms 1.25 699MB 2.00 10 16 8.02s <NULL> <Rprof… <benc… <tib…
#2 jc 5.66s 5.76s 0.172 820MB 0.275 10 16 58.13s <NULL> <Rprof… <benc… <tib…
#3 tidyr1 195.03ms 268.97ms 4.01 308MB 2.00 10 5 2.5s <NULL> <Rprof… <benc… <tib…
#4 tidyr2 590.91ms 748.35ms 1.31 312MB 0.656 10 5 7.62s <NULL> <Rprof… <benc… <tib…
#5 dt 318.1ms 384.21ms 2.47 206MB 0.986 10 4 4.06s <NULL> <Rprof… <benc… <tib…
PS - Also included tidyr::crossing
for comparison as it does the same thing.
Combinations using expand.grid in vector
If we need to create a single string for combinations from 2 to the length
of 'myvec', use combn
grid <- data.frame(Var1 = unlist(lapply(2:length(myvec), \(i)
combn(myvec, i, FUN = paste, collapse = "_"))))
-output
> head(grid)
Var1
1 B2_B3
2 B2_B4
3 B2_B8
4 B2_NDVI
5 B2_SAVI
6 B2_SIPI
> tail(grid)
Var1
32747 B2_B3_B4_B8_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32748 B2_B3_B4_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32749 B2_B3_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32750 B2_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32751 B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
32752 B2_B3_B4_B8_NDVI_SAVI_SIPI_SR_RGI_TVI_MSR_PRI_GNDVI_PSRI_GCI
How to use expand.grid with conditions?
Here is another base R way. It uses a logical index to modify columns d
and e
, the rest of the code is like in the question. The tests below show it's the fastest alternative.
f1 <- function(a, b, c, d, e){
X <- expand.grid(a, b, c, d, e)
names(X) <- c("a","b","c","d","e")
X$d <- ifelse(X$c == 0, X$d[1], X$d)
X$e <- ifelse(X$c == 0, X$d[1], X$e)
unique(X)
}
f2 <- function(a, b, c, d, e){
X <- expand.grid(a, b, c, d, e)
names(X) <- c("a","b","c","d","e")
i <- X$c == 0
X$d[i] <- X$d[1]
X$e[i] <- X$e[1]
unique(X)
}
library(tidyr)
library(dplyr)
f3 <- function(a, b, c, d, e){
crossing(a, b, c, d, e) %>%
mutate_at(vars(d, e), ~ replace(., c == 0, first(.))) %>%
distinct
}
a = 1:5
b = 1:5
c = 0:3
d = 1:5
e = 1:3
library(microbenchmark)
mb <- microbenchmark(
op = f1(a,b,c,d,e),
rui = f2(a,b,c,d,e),
akrun = f3(a,b,c,d,e)
)
print(mb, unit = "relative", order = "median")
#Unit: relative
# expr min lq mean median uq max neval cld
# rui 1.0000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# op 0.8147996 1.035322 1.018649 1.026295 1.038269 1.096384 100 a
# akrun 1.7580304 1.815582 1.836061 1.827887 1.872767 1.107545 100 b
A more generalized expand.grid function?
Assuming a
, b
and c
each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a
, b
and c
and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c)
:
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)
Related Topics
How to Create a Namespace Package in Python
Keras Not Training on Entire Dataset
Lag When Win.Blit() Background Pygame
How to Convert the Background Color of Image to Match the Color of Pygame Window
Why Python Recursive Function Returns None
Django Auto_Now and Auto_Now_Add
Why Do We Use _Init_ in Python Classes
Make Sure Only a Single Instance of a Program Is Running
How to Validate Ip Address in Python
Appending the Same String to a List of Strings in Python
Pelican 3.3 Pelican-Quickstart Error "Valueerror: Unknown Locale: Utf-8"
How to Run an External Command Asynchronously from Python
Windows Scipy Install: No Lapack/Blas Resources Found
Can "List_Display" in a Django Modeladmin Display Attributes of Foreignkey Fields