Is There Something Wrong with This Python Code, Why Does It Run So Slow Compared to Ruby

Is there something wrong with this python code, why does it run so slow compared to ruby?

The recursion efficiency of python is the cause of this overhead. See this article for much more detail. The above solutions that solve this iteratively are better for python since they do not incur the function call overhead recursion does. My assumption about ruby would be that it is clearly optimizing the code while python is not. Again, that article goes into a lot detail about this using a nearly identical fib function.

Why is python slower compared to Ruby even with this very simple test ?

First off, note that the Python version you show is incorrect: you're running this code in Python 2.7, not 3.1 (it's not even valid Python3 code). (FYI, Python 3 is usually slower than 2.)

That said, there's a critical problem in the Python test: you're writing it as global code. You need to write it as a function. It runs about twice as fast when written correctly, in both Python 2 and 3:

def main():
    i = 0
    a = 0
    while i < 6553500:
        i += 1
        if i != 6553500:
           a = i
        else:
            print("o")
    print(a)

if __name__ == "__main__":
    main()

When you write code globally, you have no locals; all of your variables are global variables. Locals are much faster than globals in Python, because globals are stored in a dict. Locals can be referenced directly by the VM by index, so no hash table lookups are needed.

Also, note that this is such a simple test, what you're really doing is benchmarking a few arbitrary bytecode operations.

Why is equivalent Python code so much slower

Summary

"Because the function call overhead in Python is much larger than in Ruby."

Details

Being a microbenchmark, this really doesn't say much about the performance of either language in proper use. Likely you would want to rewrite the program to take advantage of the strengths of Python and Ruby, but this does illustrate one of the weak points of Python at the moment. The root cause of the speed differences come from function call overhead. I made a few tests to illustrate. See below for code and more details. For the Python tests, I used 2000 for both gcd parameters.

Interpreter: Python 2.6.6
Program type: gcd using function call
Total CPU time: 29.336 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code
Total CPU time: 13.194 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code, with dummy function call
Total CPU  time: 30.672 seconds

This tells us that it's not the calculation made by the gcd function that contributes most to the time difference, it's the function call itself. With Python 3.1, the difference is similar:

Interpreter: Python 3.1.3rc1
Program type: gcd using function call
Total CPU time: 30.920 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code
Total CPU time: 15.185 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code, with dummy function call
Total CPU time: 33.739 seconds

Again, the actual calculation is not biggest contributor, it's the function call itself. In Ruby, the function call overhead is much smaller. (Note: I had to use smaller parameters (200) for the Ruby version of the programs because the Ruby profiler really slows down real-time performance. That doesn't affect CPU time performance, though.)

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd using function call
Total CPU time: 21.66 seconds

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd using inline code
Total CPU time: 21.31 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd using function call
Total CPU time: 27.00 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd using inline code
Total CPU time: 24.83 seconds

Notice how neither Ruby 1.8 nor 1.9 suffer greatly from the gcd function call – the function call vs. inline version are more or less equal. Ruby 1.9 seems to be a little better with less difference between the function call and inline versions.

So the answer to the question is: "because the function call overhead in Python is much larger than in Ruby".

Code

# iter_gcd -- Python 2.x version, with gcd function call
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def gcd(m, n):
    if n > m:
        m, n = n, m
    while n != 0:
        rem = m % n
        m = n
        n = rem
    return m

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            comp += gcd(i,j)
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation, dummy function call
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def dummyfunc(n, m):
    a = n + m

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
            dummyfunc(i, j)
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Ruby version, with gcd function call

def gcd(m, n)
    if n > m
        m, n = n, m
    end
    while n != 0
        rem = m % n
        m = n
        n = rem
    end
    return m
end

def main(a1, a2)
    comp = 0
    a1.downto 2 do
        |j|
        1.upto a2-1 do
            |i|
            comp += gcd(i,j)
        end
    end
    puts comp
end

 if __FILE__ == $0
    if ARGV.length != 2
        $stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    end
end

# iter_gcd -- Ruby version, with inline gcd

def main(a1, a2)
    comp = 0
    a1.downto 2 do |j|
        1.upto a2-1 do |i|
            m, n = i, j
            if n > m
                m, n = n, m
            end
            while n != 0
                rem = m % n
                m = n
                n = rem
            end
            comp += m
        end
    end
    puts comp
end

 if __FILE__ == $0
    if ARGV.length != 2
        $stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    end
end

Test runs

Finally, the commands used to run Python and Ruby with profiling to get the numbers for comparison were pythonX.X -m cProfile iter_gcdX.py 2000 2000 for Python and rubyX.X -rprofile iter_gcdX.rb 200 200 for Ruby. The reason for the difference is that the Ruby profiler adds a lot of overhead. The results are still valid because I'm comparing the difference between a function call and inline code, not the difference between Python and Ruby as such.

Why are Python and Ruby so slow, while Lisp implementations are fast?

Natively compiled Lisp systems are usually quite a bit faster than non-natively compiled Lisp, Ruby or Python implementations.

Definitions:

natively compiled -> compiles to machine code
compiled -> compiles to machine code or some other target (like byte code, JVM instructions, C code, ...)
interpreted Lisp -> runs s-expressions directly without compilation
interpreted Python -> runs compiled Python in a byte-code interpreter. The default Python implementation is not really interpreted, but using a compiler to a byte code instruction set. The byte code gets interpreted. Typically byte code interpreters are slower than execution of native code.

But keep in mind the following:

SBCL uses a native code compiler. It does not use a byte code machine or something like a JIT compiler from byte code to native code. SBCL compiles all code from source code to native code, before runtime. The compiler is incremental and can compile individual expressions. Thus it is used also by the EVAL function and from the Read-Eval-Print-Loop.
SBCL uses an optimizing compiler which makes use of type declarations and type inference. The compiler generates native code.
Common Lisp allows various optimizations which make the code less dynamic or not dynamic (inlining, early binding, no type checks, code specialized for declared types, tail-call optimizations, ...). Code which makes use of these advanced features can look complicated - especially when the compiler needs to be told about these things.
Without these optimizations compiled Lisp code is still faster than interpreted code, but slower than optimized compiled code.
Common Lisp provides CLOS, the Common Lisp Object System. CLOS code usually is slower than non-CLOS - where this comparison makes sense. A dynamic functional language tends to be faster than a dynamic object-oriented language.
If a language implementation uses a highly optimized runtime, for example for bignum arithmetic operations, a slow language implementation can be faster than an optimizing compiler. Some languages have many complex primitives implemented in C. Those tend to be fast, while the rest of the language can be very slow.
there can also be implementations of Python, which generate and run machine code, like the JIT compiler from PyPy. Ruby also now has a JIT compiler since Ruby 2.6.

Also some operations may look similar, but could be different. Is a for loop iterating over an integer variable really the same as a for loop which iterates over a range?

Why do people say that Ruby is slow?

Why is Ruby considered slow?

Because if you run typical benchmarks between Ruby and other languages, Ruby loses.

I do not find Ruby to be slow but then
again, I'm just using it to make
simple CRUD apps and company blogs.
What sort of projects would I need to
be doing before I find Ruby becoming
slow? Or is this slowness just
something that affects all programming
languages?

Ruby probably wouldn't serve you well in writing a real-time digital signal processing application, or any kind of real-time control system. Ruby (with today's VMs) would probably choke on a resource-constrained computer such as smartphones.

Remember that a lot of the processing on your web applications is actually done by software developed in C. e.g. Apache, Thin, Nginx, SQLite, MySQL, PostgreSQL, many parsing libraries, RMagick, TCP/IP, etc are C programs used by Ruby. Ruby provides the glue and the business logic.

What are your options as a Ruby
programmer if you want to deal with
this "slowness"?

Switch to a faster language. But that carries a cost. It is a cost that may be worth it. But for most web applications, language choice is not a relevant factor because there is just not enough traffic justify using a faster language that costs much more to develop for.

Which version of Ruby would best suit
an application like Stack Overflow
where speed is critical and traffic is
intense?

Other folks have answered this - JRuby, IronRuby, REE will make the Ruby part of your application run faster on platforms that can afford the VMs. And since it is often not Ruby that causes slowness, but your computer system architecture and application architecture, you can do stuff like database replication, multiple application servers, loadbalancing with reverse proxies, HTTP caching, memcache, Ajax, client-side caching, etc. None of this stuff is Ruby.

Finally, I can't find much news on
Ruby 2.0 - I take it we're a good few
years away from that then?

Most folks are waiting for Ruby 1.9.1. I myself am waiting for Rails 3.1 on Ruby 1.9.1 on JRuby.

Finally, please remember that a lot of developers choose Ruby because it makes programming a more joyful experience compared to other languages, and because Ruby with Rails enables skilled web developers to develop applications very quickly.

What makes Ruby slow?

Ruby is slow. But what parts of it are the most problematic?

It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember variable names per context to allow for eval, so its frames and method calls are slower. Also it lacks a good JIT compiler currently, though MRI 1.9 has a bytecode compiler (which is better), and jruby compiles it down to java bytecode, which then (can) compile via the HotSpot JVM's JIT compiler, but it ends up being about the same speed as 1.9.

How much does the garbage collector effect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.

from some of the graphs at http://www.igvita.com/2009/06/13/profiling-ruby-with-googles-perftools/ I'd say it takes about 10% which is quite a bit--you can decrease that hit by increasing the malloc_limit in gc.c and recompiling.

I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?

Ruby 1.8 "didn't" implement basic math it implemented Numeric classes and you'd call things like Fixnum#+ Fixnum#/ once per call--which was slow. Ruby 1.9 cheats a bit by inlining some of the basic math ops.

Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?

Things like eval are hard to implement efficiently, though much work can be done, I'm sure. The kicker for Ruby is that it has to accomodate for somebody in another thread changing the definition of a class spontaneously, so it has to be very conservative.

Has there been recent work that has significantly improved performance?

1.9 is like a 2x speedup. It's also more space efficient. JRuby is constantly trying to improve speed-wise [and probably spends less time in the GC than KRI]. Besides that I'm not aware of much except little hobby things I've been working on. Note also that 1.9's strings are at times slower because of encoding friendliness.

Why Python is so slow for a simple for loop?

Why is Java faster than Python on this example loops?

Novice Explanation: Think of a program like a freight train that lays its own train-track as it moves forward. Track must be laid before the train can move. The Java Freight train can send thousands of track-layers ahead of the train, all working in parallel laying track many miles in advance, wheras python can only send one laboror at a time, and can only lay track 10 feet in front of where the train is.

Java has strong types and that affords the compiler to use JIT features: (https://en.wikipedia.org/wiki/Just-in-time_compilation) which enable the CPU to fetch memory and execute instructions in the future in parallel, before the instruction is needed. Java can 'sort of' run the instructions in your for loop in parallel with itself. Python has no concrete types and so the nature of the work to be done has to be decided at every instruction. This causes your entire computer to stop and wait for all the memory in all of your variables to be re-scanned. Meaning loops in python are polynomial O(n^2) time, wheras Java loops can be, and often are linear time O(n), due to strong types.

I think I am making mistakes because I know Python is used by lots of scientific projects.

They're heavily using SciPy (NumPy being the most prominent component, but I've heard the ecosystem that developed around NumPy's API is even more important) which vastly speeds up all kinds operations these projects need. There's what you are doing wrong: You aren't writing your critical code in C. Python is great for developing in general, but well-placed extension modules are a vital optimization in its own right (at least when you're crunching numbers). Python is a really crappy language to implement tight inner loops in.

The default (and for the time being most popular and widely-supported) implementation is a simple bytecode interpreter. Even the simplest operations, like an integer division, can take hundreds of CPU cycles, multiple memory accesses (type checks being a popular example), several C function calls, etc. instead of a few (or even single, in the case of integer division) instruction. Moreover, the language is designed with many abstractions which add overhead. Your loop allocates 9999 objects on the heap if you use xrange - far more if you use range (99999999 integer minus around 256256 for small integers which are cached). Also, the xrange version calls a method on each iteration to advance - the range version would too if iteration over sequences hadn't been optimized specifically. It still takes a whole bytecode dispatch though, which is itself vastly complex (compared to an integer division, of course).

It would be interesting to see what a JIT (I'd recommend PyPy over Psyco, the latter isn't actively developed anymore and very limited in scope anyway - it might work well for this simple example though). After a tiny fraction of iterations, it should produce a nigh-optimal machine code loop augmented with a few guards - simple integer comparisions, jumping if they fail - to maintain correctness in case you got a string in that list. Java can do the same thing, only sooner (it doesn't have to trace first) and with fewer guards (at least if you use ints). That's why it's so much faster.

Why does my Ruby script slow down over time?

This is the infamous garbage collector -- Ruby's memory managment mechanism.

Note: It's worth mentioning that Ruby, at least MRI, isn't a high performance language.

The garbage collector runs whenever memory starts to run out. The garbage collector pauses the execution of the program to deallocate any objects that can no longer be accessed. The garbage collector only runs when memory starts to run out. That's why you're seeing it periodically.

There's nothing you can do to avoid this, except write more memory efficiant code, or rewrite in a language that can has better/manual memory management.

Also, your OS may be paging. Do you have enough physical memory for this kind of task?

Why is my Ruby code for Project Euler #10 so slow?

Your isPrime() test is very slow on primes; but you don't even need it. The key to sieve is, initially all the numbers are marked as prime; then for each prime we mark off all its multiples. So when we get to a certain entry in the sieve, we already know whether it is a prime or not - whether it is marked true for being prime, or it is marked false for being composite (a multiple of some smaller prime).

There is no need to test it being prime, at all.

And to find the multiples, we just count: for 5, it's each 5th entry after it; for 7 - each 7th. No need to test them with % operator, just set to false right away. No need to set any of them to true, because all numbers were set to true at the start.

Is There Something Wrong with This Python Code, Why Does It Run So Slow Compared to Ruby