Speeding up Julia's poorly written R examples
Hmm, in the Mandelbrot example the matrix M has its dimensions transposed
M = matrix(0.0,nrow=length(im), ncol=length(re))
because it's filled by incrementing count
in the inner loop (successive values of im
). My implementation creates a vector of complex numbers in mandelperf.1
and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2
mandel.1 = function(z, maxiter=80L) {
c <- z
result <- integer(length(z))
i <- seq_along(z)
n <- 0L
while (n < maxiter && length(z)) {
j <- Mod(z) <= 2
if (!all(j)) {
result[i[!j]] <- n
i <- i[j]
z <- z[j]
c <- c[j]
}
z <- z^2 + c
n <- n + 1L
}
result[i] <- maxiter
result
}
mandelperf.1 = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
mandel.1(complex(real=rep(re, each=length(im)),
imaginary=im))
}
for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).
> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 mandelperf.1() 0.412 1.00000
1 mandelperf() 5.705 13.84709
> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE
The quicksort example doesn't actually sort
> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5
but my main speed-up was to vectorize the partition around the pivot
qsort_kernel.1 = function(a) {
if (length(a) < 2L)
return(a)
pivot <- a[floor(length(a) / 2)]
c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}
qsort.1 = function(a) {
qsort_kernel.1(a)
}
sortperf.1 = function(n) {
v = runif(n)
return(qsort.1(v))
}
for a 7-fold speedup (in comparison to the uncorrected original)
> benchmark(sortperf(5000), sortperf.1(5000),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 sortperf.1(5000) 6.60 1.000000
1 sortperf(5000) 47.73 7.231818
Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.
Speeding up a function
data.table is optimized for many rows, not for many columns. Since you have many columns, you could try melting the data.table:
DFm <- melt(DF[, cols, with = FALSE][, !"uniqueID"], id = "panelID")
#coerces all numers to double (common type),
#you could separate the data.table by integer/double to avoid this
DFm[, value := c(NA, diff(value)), by = .(panelID, variable)]
dcast(DFm, panelID + rowidv(DFm, cols = c("panelID", "variable")) ~ variable, value.var = "value")
Speeding up Zygote.jl AD
Given your main
function, you might be executing this in a script. In Julia,you are far better off starting a session (in the REPL, VSCode, Jupyter notebook, or other environment) and running multiple workloads from the same session. As Antonello suggests in a comment, your first call will be dominated by compile time, but the later calls (with the same argument types) simply use the compiled code and can be a completely different experience from the first one.
Some workflow tips can be found in https://docs.julialang.org/en/v1/manual/workflow-tips/.
What's Julia's equivalent of R's seq(..., length.out = n)
As of Julia 1.0:
linspace
has been deprecated. You can still use range
:
julia> range(0, stop = 5, length = 3)
0.0:2.5:5.0
As @TasosPapastylianou noted, if you want this to be a vector of values, you can use collect
:
julia> collect( range(0, stop = 5, length = 3) )
3-element Array{Float64,1}:
0.0
2.5
5.0
Speeding up identification of subsequences
Computation time is strongly linked to:
- Number of events per sequence. The algorithm was designed for a small number of event per sequence (<6 typically) and many sequences. You can try removing some events that are not your main interest or analysing group of events. I guess that the relationship between number of events and computation time is at least exponential. With more than 10 events per sequences, it can be really slow.
- Minimum support. With low minimum support the possible number of subsequence get really big. Try to set it to an higher value.
Hope this helps.
Related Topics
Running Multiple Linear Regressions Across Several Columns of a Data Frame in R
R: Arranging Multiple Plots Together Using Gridextra
Move a Column to First Position in a Data Frame
Name Columns Within Aggregate in R
Clip Values Between a Minimum and Maximum Allowed Value in R
Subset Rows with (1) All and (2) Any Columns Larger Than a Specific Value
Quick/Elegant Way to Construct Mean/Variance Summary Table
Using Apply on a Multidimensional Array in R
Scale_Color_Manual Colors Won't Change
R: Multiple Linear Regression Model and Prediction Model
Delete Rows with Blank Values in One Particular Column
Ggplot2 Legend to Bottom and Horizontal
How to Create a Continuous Density Heatmap of 2D Scatter Data in R