Speed Comparison with Project Euler: C VS Python VS Erlang VS Haskell

Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell

Using GHC 7.0.3, gcc 4.4.6, Linux 2.6.29 on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp for Haskell and gcc -O3 -lm for C.

Your C routine runs in 8.4 seconds (faster than your run probably because of -O3)
The Haskell solution runs in 36 seconds (due to the -O2 flag)
Your factorCount' code isn't explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds
in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you).
You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds.
factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:

 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s

That's right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm flag I'm still getting 8.182 seconds, so the NCG backend is doing well in this case too.

Conclusion: Haskell is awesome.

Resulting Code

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

EDIT: So now that we've explored that, lets address the questions

Question 1: Do erlang, python and haskell lose speed due to using
arbitrary length integers or don't they as long as the values are less
than MAXINT?

In Haskell, using Integer is slower than Int but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int is sufficient. For portability sake you should probably rewrite my code to use Int64 or Word64 (C isn't the only language with a long).

Question 2: Why is haskell so slow? Is there a compiler flag that
turns off the brakes or is it my implementation? (The latter is quite
probable as haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these
implementations without changing the way I determine the factors?
Optimization in any way: nicer, faster, more "native" to the language.

That was what I answered above. The answer was

0) Use optimization via -O2
1) Use fast (notably: unbox-able) types when possible
2) rem not mod (a frequently forgotten optimization) and
3) worker/wrapper transformation (perhaps the most common optimization).

Question 4: Do my functional implementations permit LCO and hence
avoid adding unnecessary frames onto the call stack?

Yes, that wasn't the issue. Good work and glad you considered this.

C vs Haskell Collatz conjecture speed comparison

The big problem with your Haskell code is that you are dividing, which you don't in the C version.

Yes, you wrote n % 2 and n / 2, but the compiler replaces that with shifts and bitwise and. GHC has unfortunately not yet been taught to do that.

If you do the substitution yourself

module Main where

import System.Environment (getArgs)
import Data.Int (Int32, Int64)
import Data.Bits

main :: IO ()
main = do
    arg <- getArgs
    print $ maxCol 0 (read (head arg) :: Int64)

col :: Int64 -> Int32
col x = col' x 0

col' :: Int64 -> Int32 -> Int32
col' 1 n            = n
col' x n
    | x .&. 1 == 0  = col' (x `shiftR` 1) (n + 1)
    | otherwise     = col' (3 * x + 1) (n + 1)

maxCol :: Int32 -> Int64 -> Int32
maxCol maxS 2   = maxS
maxCol maxS n
    | s > maxS  = maxCol s (n - 1)
    | otherwise = maxCol maxS (n - 1)
    where s = col n

with a 64-bit GHC you get comparable speed (0.35s vs C's 0.32s on my box for a limit of 1000000). If you compile using the LLVM backend, you don't even need to replace the % 2 and / 2 with bitwise operations, LLVM does that for you (but it produces slower code, 0.4s, for your original Haskell source, surprisingly - normally, LLVM is not worse than the native code generator at loop optimisation).

With a 32-bit GHC, you won't get comparable speed, since with those, the primitive operations on 64-bit integers are implemented through C calls - there never was enough demand for fast 64-bit operations on 32-bit systems for them to be implemented as primops; the few people working on GHC spent their time doing other, more important, things.

TL;DR: Is Haskell code quick to write and simple to maintain only for computationally simple tasks and loses this characteristic when performance is crucial?

That depends. You must have some idea of what sort of code GHC generates from what sort of input, and you must avoid some performance traps. With a bit of practice, it is quite easy to get within say 2× the speed of gcc -O3 for such tasks.

Haskell slower than Python in naïve integer factorization?

If you compute limit = ceiling . sqrt . fromIntegral $ n only once, instead of once per iteration, then I see the Haskell version being faster:

limit = ceiling . sqrt . fromIntegral $ n
smallestFactor = getSmallestFactor x

getSmallestFactor x
    | n `rem` x == 0 = x
    | x > limit      = n
    | otherwise      = getSmallestFactor (x+1)

using this version, I see:

$ time ./factorizePy.py 89273487253497
[3, 723721, 41117819]

real    0m0.236s
user    0m0.171s
sys     0m0.062s

$ time ./factorizeHs  89273487253497
[3,723721,41117819]

real    0m0.190s
user    0m0.000s
sys     0m0.031s

Project Euler 14: performance compared to C and memoization

After having compiled it with optimisations, there are still several differences to the C programme

you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.

For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,

import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)

type Memo = Map Integer Int

syr :: Integer -> State Memo Int
syr n = do
    mb <- gets (Map.lookup n)
    case mb of
      Just l -> return l
      Nothing -> do
          let m = if even n then n `quot` 2 else 3*n+1
          l <- syr m
          let l' = l+1
          modify (Map.insert n l')
          return l'

solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
    | len > 1000000 = return (maxi,len)
    | otherwise = do
         l <- syr start
         if len < l
             then solve start l (start+1)
             else solve maxi len (start+1)

p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)

but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.

Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.

module Main (main) where

import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int

main :: IO ()
main = do
    args <- getArgs
    let bd = case args of
               a:_ -> read a
               _   -> 100000
    print $ collMax bd

next :: Int -> Int
next n
    | n .&. 1 == 0  = n `unsafeShiftR` 1
    | otherwise     = 3*n + 1

collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
    arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
    let go l m
            | upper < m = go (l+1) $ next m
            | otherwise = do
                l' <- unsafeRead arr m
                case l' of
                  0 -> do
                      l'' <- go 1 $ next m
                      unsafeWrite arr m (l'' + 1)
                      return (l+l'')
                  _ -> return (l+l'-1)
        collect mi ml i
            | upper < i = return (mi, ml)
            | otherwise = do
                l <- go 1 i
                if l > ml
                  then collect i l (i+1)
                  else collect mi ml (i+1)
    unsafeWrite arr 1 1
    collect 1 1 2

Project Euler - How is this haskell code so fast?

Prelude> sigma2big 1000
401382971
(0.48 secs, 28491864 bytes)

Prelude> sigma2big 10^3
103161709
(0.02 secs, 1035252 bytes)

Prelude> (sigma2big 10)^3
103161709

function precedence (shh...)

Why did you decide against using Erlang?

I returned to Haskell for my personal projects from Erlang for the simple virtue of Haskell's amazing type system. Erlang gives you a ton of tools to handle when things go wrong. Haskell gives you tools to keep you from going wrong in the first place.

When working in a language with a strong type system you are effectively proving free theorems about your code every time you compile.

You also get a bunch of overloading sugar from Haskell's typeclass machinery, but that is largely secondary to me -- even if it does allow me to express a number of abstractions that would be terribly verbose or non-idiomatic and unusable in Erlang (e.g. Haskell's category-extras).

I love Erlang, I love its channels and its effortless scalability. I turn to it when these are the things I need. Haskell isn't a panacea. I give up a better operational understanding of space consumption. I give up the magical one pass garbage collector. I give up OTP patterns and all that effortless scalability.

But its hard for me to give up the security blanket that, as is commonly said, in Haskell, if it typechecks, it is probably correct.

Speed Comparison with Project Euler: C VS Python VS Erlang VS Haskell