Why Is This F# Code So Slow

Why is this F# code so slow?

The problem is that the min3 function is compiled as a generic function that uses generic comparison (I thought this uses just IComparable, but it is actually more complicated - it would use structural comparison for F# types and it's fairly complex logic).

> let min3(a, b, c) = min a (min b c);;
val min3 : 'a * 'a * 'a -> 'a when 'a : comparison

In the C# version, the function is not generic (it just takes int). You can improve the F# version by adding type annotations (to get the same thing as in C#):

let min3(a:int, b, c) = min a (min b c)

...or by making min3 as inline (in which case, it will be specialized to int when used):

let inline min3(a, b, c) = min a (min b c);;

For a random string str of length 300, I get the following numbers:

> levenshtein str ("foo" + str);;
Real: 00:00:03.938, CPU: 00:00:03.900, GC gen0: 275, gen1: 1, gen2: 0
val it : int = 3

> levenshtein_inlined str ("foo" + str);;
Real: 00:00:00.068, CPU: 00:00:00.078, GC gen0: 0, gen1: 0, gen2: 0
val it : int = 3

Why is this simple F# code 36 times slower than C#/C++ versions?

This is very definitely happening directly as a consequence of using the expression:

for i in seq{1..N} do

On my machine, this gives the result:

100000000
00:00:09.1500924

If I change the loop to:

for i in 1..N do

The result changes dramatically:

100000000
00:00:00.1001864

Why?

The IL generated by these two approaches is quite different. The second case, using the 1..N syntax simply gets compiled the same way as a C# for(int i=1; i<N+1; ++i) loop.

The first case is quite different, this version produces a full sequence which is then enumerated by a foreach loop.

The C# and F# versions making use of IEnumerables differ in that they use different range functions to generate them.

The C# version uses System.Linq.Enumerable.RangeIterator to generate the value range, while the F# version uses Microsoft.FSharp.Core.Operators.OperatorIntrinsics.RangeInt32. I think it's safe to assume that the performance difference we're seeing between the C# and F# versions in this particular case are a result of the performance characteristics of these two functions.

svick is correct to point out in his comment that the + operator is actually being passed as an argument to the integralRangeStep function.

For the non-trivial case where n <> m this results in the F# compiler using a ProperIntegralRangeEnumerator with the implementation found here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/FSharp.Core/prim-types.fs#L6463

let inline integralRangeStepEnumerator (zero,add,n,step,m,f) : IEnumerator<_> =
    // Generates sequence z_i where z_i = f (n + i.step) while n + i.step is in region (n,m)
    if n = m then
        new SingletonEnumerator<_> (f n) |> enumerator 
    else
        let up = (n < m)
        let canStart = not (if up then step < zero else step > zero) // check for interval increasing, step decreasing 
        // generate proper increasing sequence
        { new ProperIntegralRangeEnumerator<_,_>(n,m) with 
                member x.CanStart = canStart
                member x.Before a b = if up then (a < b) else (a > b)
                member x.Equal a b = (a = b)
                member x.Step a = add a step
                member x.Result a = f a } |> enumerator

We can see that stepping through the Enumerator results in calls to the supplied add function rather than a more straightforward, direct addition.

Note: All timings run in Release mode (Tail Calls: On, Optimisation: On).

F# Performance: What is making this code so slow?

There is no Seq.tail function in the core F# library (UPDATE: Yes there is, see comments), so I assume you're using the Seq.tail function from FSharpx.Collections. If you're using a different implementation of Seq.tail, it's probably similar -- and it's almost certainly the cause of your problems, because it's not O(1) like you think it is. Getting the tail of a List is O(1) because of how List is implemented (as a series of cons cells). But getting the tail of a Seq ends up creating a brand new Seq from the original enumerable, discarding one item from it, and returning the rest of its items. When you go through your accum loop a second time, you call Seq.tail on that "skip 1 then return" seq. So now you have a Seq which I'll call S2, which asks S1 for an IEnumerable, skips the first item of S1, and returns the rest of it. S1, when asked for its first item, asks S0 (the original Seq) for an enumerable, skips its first item, then returns the rest of it. So for S2 to skip two items, it had to create two seqs. Now on your next run through when you ask for the Seq.tail of S2, you create S3 that asks S2 for an IEnumerable, which asks S1 for an IEnumerable, which asks S0 for an IEnumerable... and so on. This is effectively O(N^2), when you thought you were writing an O(N) operation.

I'm afraid I don't have time right now to figure out a solution for you; using List.tail won't help since you need an infinite sequence. But perhaps just knowing about the Seq.tail gotcha is enough to get you started, so I'll post this answer now even though it's not complete.

If you need more help, comment on this answer and I'll come back to it when I have time -- but that might not be for several days, so hopefully others will also answer your question.

F# seems slower than other languages... what can I do to speed it up?

Unless you can give a reasonably sized code sample, it's difficult to tell. Anyway, the imperative F# version should be as efficient as the imperative C# version. I think one approach is to benchmark the two to see what is causing the difference (then someone can help with making that bit faster).

I briefly looked at your code and here are some assorted (untested) suggestions.

You can replace discriminated union Cell with an enum (this means you'll use value types and integer comparison instead of reference types and runtime type tests):
```
type Cell =    
  | Orange = 1
  | Yellow = 2
  | Barren = 3
```

You can mark some trivial functions as inline. For example:

let inline myincr (arr:int array) idx =
  arr.[idx] <- arr.[idx] + 1

Don't use exceptions for control-flow. This is often done in OCaml, but .NET exceptions are slow and should be only used for exceptions. You can replace the for loop in your sample with a while loop and a mutable flag or with a tail-recursive function (a tail-recursive function is compiled into a loop, so it will be efficient, even in imperative solution).

Why is printf in F# so slow?

I am not sure how much it matters, but...

Inspecting the code for printf:

https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/printf.fs

I see

// The general technique used this file is to interpret
// a format string and use reflection to construct a function value that matches
// the specification of the format string.

and I think the word 'reflection' probably answers the question.

printf is great for writing simple type-safe output, but if you want good perf in an inner loop, you might want to use a lower-level .NET API to write output. I haven't done my own benchmarking to see.

f# compiling too slow

I have no idea how fast a Pentium 4 should compile that "hello world" program, but 15 seconds strikes me as pretty slow. I once had similar speed problems with a VS 2010 Beta and the problem turned out to be that Visual Studio and the F# compiler weren't yet properly NGENed.

Normally, the Visual Studio install should make sure that everything gets NGENed, but maybe something went wrong. You can check if the F# compiler was NGENed with the following command in a console window with admin rights:

cd "C:\Program Files\FSharp-2.0.0.0\bin"
c:\windows\Microsoft.NET\Framework\v4.0.30319\ngen.exe display fsc.exe

If the result of that shows that the native image of fsc.exe is still pending, you could force the compilation with:

c:\windows\Microsoft.NET\Framework\v4.0.30319\ngen.exe executeQueuedItems

Note: I'm not sure which version of the F# compiler you're using exactly. The one used by the full install of VS2010 is the one in C:\Program Files\Microsoft F#\v4.0 (or C:\Program Files (x86)\Microsoft F#\v4.0 on 64-bit machines). So, if you use that one, you have to cd into that folder instead of the C:\Program Files\FSharp-2.0.0.0\bin folder.

Why is this F# code slower than the C# equivalent?

You can use List.forall instead of converting to a seq and then doing Seq.length:

let divisors = [3..20] |> List.rev
let isDivOneToTwenty n = divisors |> List.forall (fun d -> n % d = 0)

Seq.length will need to traverse the entire sequence to determine the number of elements, while forall can return as soon as an element fails the predicate.

you can also write findNum as:

let rec findNum n = if isDivOneToTwenty n then n else findNum (n + 2)

Why is calling my C code from F# very slow (compared to native)?

It looks like part of the problem is that you are using float on both the F# and C side of the PInvoke signature. In F# float is really System.Double and hence is 8 bytes. In C a float is generally 4 bytes.

If this were running under the CLR I would expect you to see a PInvoke stack unbalanced error during debugging. I'm not sure if Mono has similar checks or not. But it's possible this is related to the problem you're seeing.

FSharp.Data.SqlProvider is slow

The types returned by SqlDataProvider implement IQueryable which means you can write a query expression or use Queryable.Count

open System.Linq

dbm.Dbo.Results |> Queryable.Count

query { for it in dbm.Dbo.Results do
        count
      }

Why Is This F# Code So Slow