Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell
Using GHC 7.0.3
, gcc 4.4.6
, Linux 2.6.29
on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp
for Haskell and gcc -O3 -lm
for C.
- Your C routine runs in 8.4 seconds (faster than your run probably because of
-O3
) - The Haskell solution runs in 36 seconds (due to the
-O2
flag) - Your
factorCount'
code isn't explicitly typed and defaulting toInteger
(thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) usingInt
and the time changes to 11.1 seconds - in
factorCount'
you have needlessly calledfromIntegral
. A fix results in no change though (the compiler is smart, lucky for you). - You used
mod
whererem
is faster and sufficient. This changes the time to 8.5 seconds. factorCount'
is constantly applying two extra arguments that never change (number
,sqrt
). A worker/wrapper transformation gives us:
$ time ./so
842161320
real 0m7.954s
user 0m7.944s
sys 0m0.004s
That's right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm
flag I'm still getting 8.182 seconds
, so the NCG backend is doing well in this case too.
Conclusion: Haskell is awesome.
Resulting Code
factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
where square = sqrt $ fromIntegral number
isquare = floor square
factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
where
go candidate count
| candidate > sqrt = count
| number `rem` candidate == 0 = go (candidate + 1) (count + 2)
| otherwise = go (candidate + 1) count
nextTriangle index triangle
| factorCount triangle > 1000 = triangle
| otherwise = nextTriangle (index + 1) (triangle + index + 1)
main = print $ nextTriangle 1 1
EDIT: So now that we've explored that, lets address the questions
Question 1: Do erlang, python and haskell lose speed due to using
arbitrary length integers or don't they as long as the values are less
than MAXINT?
In Haskell, using Integer
is slower than Int
but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int
is sufficient. For portability sake you should probably rewrite my code to use Int64
or Word64
(C isn't the only language with a long
).
Question 2: Why is haskell so slow? Is there a compiler flag that
turns off the brakes or is it my implementation? (The latter is quite
probable as haskell is a book with seven seals to me.)Question 3: Can you offer me some hints how to optimize these
implementations without changing the way I determine the factors?
Optimization in any way: nicer, faster, more "native" to the language.
That was what I answered above. The answer was
- 0) Use optimization via
-O2
- 1) Use fast (notably: unbox-able) types when possible
- 2)
rem
notmod
(a frequently forgotten optimization) and - 3) worker/wrapper transformation (perhaps the most common optimization).
Question 4: Do my functional implementations permit LCO and hence
avoid adding unnecessary frames onto the call stack?
Yes, that wasn't the issue. Good work and glad you considered this.
C vs Haskell Collatz conjecture speed comparison
The big problem with your Haskell code is that you are dividing, which you don't in the C version.
Yes, you wrote n % 2
and n / 2
, but the compiler replaces that with shifts and bitwise and. GHC has unfortunately not yet been taught to do that.
If you do the substitution yourself
module Main where
import System.Environment (getArgs)
import Data.Int (Int32, Int64)
import Data.Bits
main :: IO ()
main = do
arg <- getArgs
print $ maxCol 0 (read (head arg) :: Int64)
col :: Int64 -> Int32
col x = col' x 0
col' :: Int64 -> Int32 -> Int32
col' 1 n = n
col' x n
| x .&. 1 == 0 = col' (x `shiftR` 1) (n + 1)
| otherwise = col' (3 * x + 1) (n + 1)
maxCol :: Int32 -> Int64 -> Int32
maxCol maxS 2 = maxS
maxCol maxS n
| s > maxS = maxCol s (n - 1)
| otherwise = maxCol maxS (n - 1)
where s = col n
with a 64-bit GHC you get comparable speed (0.35s vs C's 0.32s on my box for a limit of 1000000). If you compile using the LLVM backend, you don't even need to replace the % 2
and / 2
with bitwise operations, LLVM does that for you (but it produces slower code, 0.4s, for your original Haskell source, surprisingly - normally, LLVM is not worse than the native code generator at loop optimisation).
With a 32-bit GHC, you won't get comparable speed, since with those, the primitive operations on 64-bit integers are implemented through C calls - there never was enough demand for fast 64-bit operations on 32-bit systems for them to be implemented as primops; the few people working on GHC spent their time doing other, more important, things.
TL;DR: Is Haskell code quick to write and simple to maintain only for computationally simple tasks and loses this characteristic when performance is crucial?
That depends. You must have some idea of what sort of code GHC generates from what sort of input, and you must avoid some performance traps. With a bit of practice, it is quite easy to get within say 2× the speed of gcc -O3 for such tasks.
Haskell slower than Python in naïve integer factorization?
If you compute limit = ceiling . sqrt . fromIntegral $ n
only once, instead of once per iteration, then I see the Haskell version being faster:
limit = ceiling . sqrt . fromIntegral $ n
smallestFactor = getSmallestFactor x
getSmallestFactor x
| n `rem` x == 0 = x
| x > limit = n
| otherwise = getSmallestFactor (x+1)
using this version, I see:
$ time ./factorizePy.py 89273487253497
[3, 723721, 41117819]
real 0m0.236s
user 0m0.171s
sys 0m0.062s
$ time ./factorizeHs 89273487253497
[3,723721,41117819]
real 0m0.190s
user 0m0.000s
sys 0m0.031s
Project Euler 14: performance compared to C and memoization
After having compiled it with optimisations, there are still several differences to the C programme
- you use
div
, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would bequot
in Haskell; that reduced the run time by some 15% here. - the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision
Integer
s. If you have 64-bitInt
s in your GHC (64-bit OS other than Windows), replaceInteger
withInt
. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.
For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State
monad,
import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)
type Memo = Map Integer Int
syr :: Integer -> State Memo Int
syr n = do
mb <- gets (Map.lookup n)
case mb of
Just l -> return l
Nothing -> do
let m = if even n then n `quot` 2 else 3*n+1
l <- syr m
let l' = l+1
modify (Map.insert n l')
return l'
solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
| len > 1000000 = return (maxi,len)
| otherwise = do
l <- syr start
if len < l
then solve start l (start+1)
else solve maxi len (start+1)
p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)
but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map
is not too cheap and an insertion is relatively expensive.
Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Int
s, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int
main :: IO ()
main = do
args <- getArgs
let bd = case args of
a:_ -> read a
_ -> 100000
print $ collMax bd
next :: Int -> Int
next n
| n .&. 1 == 0 = n `unsafeShiftR` 1
| otherwise = 3*n + 1
collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
let go l m
| upper < m = go (l+1) $ next m
| otherwise = do
l' <- unsafeRead arr m
case l' of
0 -> do
l'' <- go 1 $ next m
unsafeWrite arr m (l'' + 1)
return (l+l'')
_ -> return (l+l'-1)
collect mi ml i
| upper < i = return (mi, ml)
| otherwise = do
l <- go 1 i
if l > ml
then collect i l (i+1)
else collect mi ml (i+1)
unsafeWrite arr 1 1
collect 1 1 2
Project Euler - How is this haskell code so fast?
Prelude> sigma2big 1000
401382971
(0.48 secs, 28491864 bytes)
Prelude> sigma2big 10^3
103161709
(0.02 secs, 1035252 bytes)
Prelude> (sigma2big 10)^3
103161709
function precedence (shh...)
Why did you decide against using Erlang?
I returned to Haskell for my personal projects from Erlang for the simple virtue of Haskell's amazing type system. Erlang gives you a ton of tools to handle when things go wrong. Haskell gives you tools to keep you from going wrong in the first place.
When working in a language with a strong type system you are effectively proving free theorems about your code every time you compile.
You also get a bunch of overloading sugar from Haskell's typeclass machinery, but that is largely secondary to me -- even if it does allow me to express a number of abstractions that would be terribly verbose or non-idiomatic and unusable in Erlang (e.g. Haskell's category-extras).
I love Erlang, I love its channels and its effortless scalability. I turn to it when these are the things I need. Haskell isn't a panacea. I give up a better operational understanding of space consumption. I give up the magical one pass garbage collector. I give up OTP patterns and all that effortless scalability.
But its hard for me to give up the security blanket that, as is commonly said, in Haskell, if it typechecks, it is probably correct.
Related Topics
Subprocess Readline Hangs Waiting for Eof
Selecting from Multi-Index Pandas
How Does _Contains_ Work for Ndarrays
Multiprocessing in Python - Sharing Large Object (E.G. Pandas Dataframe) Between Multiple Processes
Simple Python Challenge: Fastest Bitwise Xor on Data Buffers
Programmatically Searching Google in Python Using Custom Search
What Are Dictionary View Objects
How to Read a Function's Signature Including Default Argument Values
Best Way to Create a "Reversed" List in Python
Python Regex Escape Operator \ in Substitutions & Raw Strings
How to Make Two Plots Side-By-Side Using Python
Find First Sequence Item That Matches a Criterion
How to Set Default Python Version to Python3 in Ubuntu
Which Is Faster in Python: X**.5 or Math.Sqrt(X)
Removing Duplicates from Dictionary
How to Make Scipy.Interpolate Give an Extrapolated Result Beyond the Input Range