Why is seq(x) so much slower than 1:length(x)?
seq
is a generic S3 method, so probably some time is lost dispatching.seq.default
is almost 100 lines long!
You're probably already aware of seq_along
, which calls a .Primitive
directly and is bit better than 1:length(x)
and the best method I have found for long loops:
f3 <- function(){
x <- 1:5; y <- numeric(length(x))
for(i in seq_along(x)) y[i] <- x[i]^2
y
}
> microbenchmark(f1(), f3())
Unit: microseconds
expr min lq median uq max neval
f1() 27.095 27.916 28.327 29.148 89.495 100
f3() 26.684 27.505 27.916 28.327 36.538 100
Is seq_len(x) always faster than 1:x in R? Is one more preferred than the other?
First note that seq_len(x)
and :
are somewhat more limited than seq()
:
seq_len(b)
generates aninteger
sequence from 1 tob
a:b
generates anumeric
sequence froma
tob
with spacing 1 (a
andb
need not beinteger
(and so the result need not be aninteger
sequence)seq(a, b, c)
generates anumeric
sequence froma
tob
with step lengthc
.a
,b
,c
can benumeric
orinteger
or mixed.
If seq_len
and :
are appropriate for you, they should be favored over seq()
since they are/call primitives (fast C functions).
As proposed in the comments to your question this can be seen from a benchmark:
bench::mark(
":" = 1:1e5,
"seq" = seq(1, 1e5),
"seq_len" = seq_len(1e5)
)
# A tibble: 3 × 4
expression min median `itr/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl>
1 : 132ns 227ns 2677132.
2 seq 4.15µs 4.71µs 197204.
3 seq_len 113ns 170ns 2236432.
It seems that seq_len()
is somewhat faster than :
.
N.B.
Some of these functions have some quirks to be aware of:
E.g.
x <- numeric(0)
y <- length(x)
you get
1:y
[1] 1 0
but
seq_len(y)
integer(0)
so the former can be an issue when dealing with arrays.
Why are F# list ranges so much slower than for loops?
Using ILSpy, listIter
looks like this:
public static void listIter(int n)
{
ListModule.Iterate<int>(
new listIter@17(n),
SeqModule.ToList<int>(
Operators.CreateSequence<int>(
Operators.OperatorIntrinsics.RangeInt32(1, 1, 10000000)
)
)
);
}
Here are the basic steps involved:
RangeInt32
creates anIEnumerable
(which is inexplicably wrapped byCreateSequence
)SeqModule.ToList
builds a list from that sequence- An instance of
listIter@17
(your lambda) is new'd up ListModule.Iterate
traverses the list calling the lambda for each element
vs forLoop
, which doesn't look much different from what you've written:
public static void forLoop(int n)
{
for (int x = 1; x < 10000001; x++)
{
int num = x + n;
double num2 = Math.Pow((double)num, 0.1);
}
}
...no IEnumerable
, lambda (it's automatically inlined), or list creation. There's a potentially significant difference in the amount of work being done.
EDIT
For curiosity's sake, here are FSI timings for list
, seq
, and for
loop versions:
listIter - Real: 00:00:03.889, CPU: 00:00:04.680, GC gen0: 57, gen1: 51, gen2: 6
seqIter - Real: 00:00:01.340, CPU: 00:00:01.341, GC gen0: 0, gen1: 0, gen2: 0
forLoop - Real: 00:00:00.565, CPU: 00:00:00.561, GC gen0: 0, gen1: 0, gen2: 0
and the seq
version for reference:
let seqIter n =
{1..length} |> Seq.iter (fun x -> doSomething (x+n))
Why is apply() method slower than a for loop in R?
As Chase said: Use the power of vectorization. You're comparing two bad solutions here.
To clarify why your apply solution is slower:
Within the for loop, you actually use the vectorized indices of the matrix, meaning there is no conversion of type going on. I'm going a bit rough over it here, but basically the internal calculation kind of ignores the dimensions. They're just kept as an attribute and returned with the vector representing the matrix. To illustrate :
> x <- 1:10
> attr(x,"dim") <- c(5,2)
> y <- matrix(1:10,ncol=2)
> all.equal(x,y)
[1] TRUE
Now, when you use the apply, the matrix is split up internally in 100,000 row vectors, every row vector (i.e. a single number) is put through the function, and in the end the result is combined into an appropriate form. The apply function reckons a vector is best in this case, and thus has to concatenate the results of all rows. This takes time.
Also the sapply function first uses as.vector(unlist(...))
to convert anything to a vector, and in the end tries to simplify the answer into a suitable form. Also this takes time, hence also the sapply might be slower here. Yet, it's not on my machine.
IF apply would be a solution here (and it isn't), you could compare :
> system.time(loop_million <- mash(million))
user system elapsed
0.75 0.00 0.75
> system.time(sapply_million <- matrix(unlist(sapply(million,squish,simplify=F))))
user system elapsed
0.25 0.00 0.25
> system.time(sapply2_million <- matrix(sapply(million,squish)))
user system elapsed
0.34 0.00 0.34
> all.equal(loop_million,sapply_million)
[1] TRUE
> all.equal(loop_million,sapply2_million)
[1] TRUE
Why sequence expressions for arrays are so slow in F#?
In cases like this I always check the generated code using one of the many good decompilers in .NET.
let explicitArray () =
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
This is compiled into the equivalent C#:
public static int[] explicitArray()
{
int[] a = ArrayModule.ZeroCreate<int>(10000000);
for (int v = 1; v < 10000001; v++)
{
a[v - 1] = v;
}
return a;
}
This is about as efficient it gets.
let arrayExpression () =
[| for v in 1..count -> v |]
This on the other hand becomes:
public static int[] arrayExpression()
{
return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}
This equivalent to:
let arrayExpression () =
let e = seq { for v in 1..count -> v }
let a = List() // do not set capacity
for v in e do
a.Add(v)
a.ToArray()
When iterating over a seq
(alias for IEnumerable
) first MoveNext
is called, then Current
. These are virtual calls which the JIT:er seldomly can inline. Checking the JIT:ed assembly code we see this:
mov rax,qword ptr [rbp+10h]
cmp byte ptr [rax],0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830030h]
# virtual call .MoveNext
call qword ptr [7FFC07830030h]
movzx ecx,al
# if .MoveNext returns false then exit
test ecx,ecx
je 00007FFC079408A0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830038h]
# virtual call .Current
call qword ptr [7FFC07830038h]
mov edx,eax
mov rcx,rdi
# call .Add
call 00007FFC65C8B300
# loop
jmp 00007FFC07940863
If we compare this with the JIT:ed code for the code that used ResizeArray
(List
)
lea edx,[rdi-1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
mov edx,edi
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+2]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
add edi,4
# loop
cmp edi,989682h
jl 00007FFC07910384
Here the JIT:er has unrolled the loop 4 times here and we have only non-virtual calls to List.Add
.
This explains why F# array expressions are slower than your other two examples.
In order to fix this one would have to fix the optimizer in F# to recognize the shape of expressions such as this:
seq { for v in 1..count -> v } |> Seq.toArray
And optimize them into:
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
The challenge is to find an optimization that is general enough to be useful but also doesn't break the semantics of F#.
Why is this F# code slower than the C# equivalent?
You can use List.forall
instead of converting to a seq and then doing Seq.length
:
let divisors = [3..20] |> List.rev
let isDivOneToTwenty n = divisors |> List.forall (fun d -> n % d = 0)
Seq.length
will need to traverse the entire sequence to determine the number of elements, while forall
can return as soon as an element fails the predicate.
you can also write findNum
as:
let rec findNum n = if isDivOneToTwenty n then n else findNum (n + 2)
Related Topics
Cv.Glmnet' Works in Rstudio But Not Rscript
Fill Area Between Multiple Lines in Plot
Convert String Back into Object in R
R: Arranging Multiple Plots Together Using Gridextra
Extract Nested List Elements Using Bracketed Numbers and Names
Regarding SQLdf Package/Regexp Function
Unique Elements of Two Vectors
Evaluate (I.E., Predict) a Smoothing Spline Outside R
How to Make a Post Request with Header and Data Options in R Using Httr::Post
R: Import CSV with Column Names That Contain Spaces
In R Plot Arima Fitted Model with the Original Series
Side by Side Histograms in the Same Graph in R
R Graphs: Creating Tufte's Horizontal Bar Lines