What Makes Ruby Slow

Why do people say that Ruby is slow?

Why is Ruby considered slow?

Because if you run typical benchmarks between Ruby and other languages, Ruby loses.

I do not find Ruby to be slow but then
again, I'm just using it to make
simple CRUD apps and company blogs.
What sort of projects would I need to
be doing before I find Ruby becoming
slow? Or is this slowness just
something that affects all programming
languages?

Ruby probably wouldn't serve you well in writing a real-time digital signal processing application, or any kind of real-time control system. Ruby (with today's VMs) would probably choke on a resource-constrained computer such as smartphones.

Remember that a lot of the processing on your web applications is actually done by software developed in C. e.g. Apache, Thin, Nginx, SQLite, MySQL, PostgreSQL, many parsing libraries, RMagick, TCP/IP, etc are C programs used by Ruby. Ruby provides the glue and the business logic.

What are your options as a Ruby
programmer if you want to deal with
this "slowness"?

Switch to a faster language. But that carries a cost. It is a cost that may be worth it. But for most web applications, language choice is not a relevant factor because there is just not enough traffic justify using a faster language that costs much more to develop for.

Which version of Ruby would best suit
an application like Stack Overflow
where speed is critical and traffic is
intense?

Other folks have answered this - JRuby, IronRuby, REE will make the Ruby part of your application run faster on platforms that can afford the VMs. And since it is often not Ruby that causes slowness, but your computer system architecture and application architecture, you can do stuff like database replication, multiple application servers, loadbalancing with reverse proxies, HTTP caching, memcache, Ajax, client-side caching, etc. None of this stuff is Ruby.

Finally, I can't find much news on
Ruby 2.0 - I take it we're a good few
years away from that then?

Most folks are waiting for Ruby 1.9.1. I myself am waiting for Rails 3.1 on Ruby 1.9.1 on JRuby.

Finally, please remember that a lot of developers choose Ruby because it makes programming a more joyful experience compared to other languages, and because Ruby with Rails enables skilled web developers to develop applications very quickly.

What makes Ruby slow?

Ruby is slow. But what parts of it are the most problematic?

It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember variable names per context to allow for eval, so its frames and method calls are slower. Also it lacks a good JIT compiler currently, though MRI 1.9 has a bytecode compiler (which is better), and jruby compiles it down to java bytecode, which then (can) compile via the HotSpot JVM's JIT compiler, but it ends up being about the same speed as 1.9.

How much does the garbage collector effect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.

from some of the graphs at http://www.igvita.com/2009/06/13/profiling-ruby-with-googles-perftools/ I'd say it takes about 10% which is quite a bit--you can decrease that hit by increasing the malloc_limit in gc.c and recompiling.

I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?

Ruby 1.8 "didn't" implement basic math it implemented Numeric classes and you'd call things like Fixnum#+ Fixnum#/ once per call--which was slow. Ruby 1.9 cheats a bit by inlining some of the basic math ops.

Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?

Things like eval are hard to implement efficiently, though much work can be done, I'm sure. The kicker for Ruby is that it has to accomodate for somebody in another thread changing the definition of a class spontaneously, so it has to be very conservative.

Has there been recent work that has significantly improved performance?

1.9 is like a 2x speedup. It's also more space efficient. JRuby is constantly trying to improve speed-wise [and probably spends less time in the GC than KRI]. Besides that I'm not aware of much except little hobby things I've been working on. Note also that 1.9's strings are at times slower because of encoding friendliness.

Why is ruby so much slower on windows?

I would guess there are a few possible options, and they probably all add up:

Ruby being mainly developed on Linux, it ends up mechanically optimised for it. The code is regularly tested for Windows and everything works, but the result is still that developer will spend more time optimising for Linux than Windows.
To my experience, recent versions of gcc (4.3 and greater) produce code more efficient than recent versions of Visual Studio (at least 2005). My tests included in both case spending about a day finding the best options for code optimisation.
Related to point 1, if you compile the same project using gcc for Windows or Linux, I usually observe a drop of performances of about 20% on Windows compared to Linux. Here again, I suppose this is because Linux (or Unices in general) is a primary target for gcc, windows is a port. Less time is spent optimising for Windows than Linux.

In the end, if one would want to optimise Ruby for Windows, a significant amount of time (and money, as far as I know, profilers on Windows don't come for free) will have to be spent using a profiler and optimising bottlenecks. And everything will have to be tested on Linux to make sure there is no loss of performance.

Of course, all than should be tested again with their new interpreter
YARV.

Why are Ruby method calls particularly slow (in comparison to other languages)?

Compiled languages often have fast method dispatch because the calling code knows an index into the class' vtable, which is an array of method pointers. After just a few pointer dereferences, the calling code can jump right into the method. The compiler create the vtable, and replaces every method name in the source code with the numerical index of the method in the vtable.

Dynamic languages such as Ruby often have slow method dispatch because the calling code has a name for the method, not a pointer (nor an index into an array containing the pointers). The calling code has to ask the object for its class, then has to ask the class if it has a method by that name, and if not, go on up the chain of ancestors asking each ancestor if it has a method by that name (this is what the compiler does in a compiled language, which is why the compiling is slow and the method dispatch is fast). Rather than a few pointer dereferences costing just a few machine instructions to invoke a method, a dynamic language must execute dozens to hundreds of machine instructions to search the object's class and all the object's ancestor classes for the method. Each class has a HashTable of names -> methods, but HashTables with string keys are an order of magnitude slower than arrays with integer indexes.

There are ways to optimize method dispatch in dynamic langauges, of course. In Ruby, that's what JRuby, Rubinius, and IronRuby are working on. But that's a subject for another question.

Why is Ruby irb iteration so slow?

I tried your benchmark with a couple of different Ruby implementations, and I got wildly differing results. This seems to confirm my suspicions that your benchmark is not measuring what you think it does. As I mentioned in my comment above: when writing benchmarks, you should always read the generated native machine code to verify that it actually measures what you think it does.

For example, there is a benchmark in the YARV benchmark suite that is supposed to measure message dispatch performance, however, on Rubinius, the message dispatch gets optimized away completely, so the only thing that is actually executed is incrementing the counter variable for the benchmark loop. Essentially, it tells you the frequency of your CPU, nothing more.

ruby 2.3.0dev (2015-08-08 trunk 51510) [x86_64-darwin14]

Here's a current snapshop of YARV:

Test  0: 0.720945
Test  1: 0.733733
Test  2: 0.722778
Test  3: 0.734074
Test  4: 0.774355
Test  5: 0.773379
Test  6: 0.751547
Test  7: 0.708566
Test  8: 0.724959
Test  9: 0.730899
Test 10: 0.725978
Test 11: 0.712902
Test 12: 0.747069
Test 13: 0.737792
Test 14: 0.736885
Test 15: 0.751422
Test 16: 0.718943
Test 17: 0.760094
Test 18: 0.746343
Test 19: 0.764731
Average: 0.738870

As you can see, the performance is very consistent across runs, and it seems to be in line with the other results posted in the comments.

rubinius 2.5.8 (2.1.0 bef51ae3 2015-08-09 3.5.1 JI) [x86_64-darwin14.4.0]

Here's the current release of Rubinius:

Test  0: 1.159465
Test  1: 1.063721
Test  2: 0.516513
Test  3: 0.515016
Test  4: 0.553987
Test  5: 0.544286
Test  6: 0.567737
Test  7: 0.563350
Test  8: 0.517581
Test  9: 0.501865
Test 10: 0.503399
Test 11: 0.512046
Test 12: 0.487296
Test 13: 0.533193
Test 14: 0.533217
Test 15: 0.511648
Test 16: 0.535847
Test 17: 0.490049
Test 18: 0.539681
Test 19: 0.551324
Average: 0.585061

As you can see, the compiler kicks in sometime during the second run, after which it gets twice as fast, significantly faster than YARV, whereas during the first two runs, it is significantly slower than YARV.

jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-07-23 89c1348 Java HotSpot(TM) 64-Bit Server VM 25.5-b02 on 1.8.0_05-b13 +jit [darwin-x86_64]

This is a current snapshot of JRuby running on a slightly old release (a couple of months) of HotSpot:

Test  0: 1.169000
Test  1: 0.805000
Test  2: 0.772000
Test  3: 0.755000
Test  4: 0.777000
Test  5: 0.749000
Test  6: 0.751000
Test  7: 0.694000
Test  8: 0.696000
Test  9: 0.708000
Test 10: 0.691000
Test 11: 0.745000
Test 12: 0.752000
Test 13: 0.755000
Test 14: 0.707000
Test 15: 0.744000
Test 16: 0.674000
Test 17: 0.710000
Test 18: 0.733000
Test 19: 0.706000
Average: 0.754650

Again, the compiler seems to kick in somewhere between runs 1 and 2, after which it performs comparably with YARV.

jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-08-09 2939c73 OpenJDK 64-Bit Server VM 25.40-b25-internal-graal-0.7 on 1.8.0-internal-b128 +jit [darwin-x86_64]

This is a slightly newer snapshot of JRuby running on a future version of HotSpot:

Test  0: 0.815000
Test  1: 0.693000
Test  2: 0.634000
Test  3: 0.615000
Test  4: 0.599000
Test  5: 0.616000
Test  6: 0.623000
Test  7: 0.611000
Test  8: 0.604000
Test  9: 0.598000
Test 10: 0.628000
Test 11: 0.627000
Test 12: 0.601000
Test 13: 0.646000
Test 14: 0.675000
Test 15: 0.611000
Test 16: 0.684000
Test 17: 0.689000
Test 18: 0.626000
Test 19: 0.639000
Average: 0.641700

Again, we see the pattern of it getting faster during the first two runs, after which it settles somewhere in between slightly faster than YARV and the other JRuby and slightly slower than Rubinius.

jruby 9.0.1.0-SNAPSHOT (2.2.2) 2015-08-09 2939c73 OpenJDK 64-Bit Server VM 25.40-b25-internal-graal-0.7 on 1.8.0-internal-b128 +jit [darwin-x86_64]

This is my favorite: JRuby+Truffle with Truffle enabled and running on a Graal-enabled JVM:

Test  0:  6.226000
Test  1:  5.696000
Test  2:  1.836000
Test  3:  0.057000
Test  4:  0.111000
Test  5:  0.103000
Test  6:  0.082000
Test  7:  0.146000
Test  8:  0.089000
Test  9:  0.077000
Test 10:  0.076000
Test 11:  0.082000
Test 12:  0.072000
Test 13:  0.104000
Test 14:  0.124000
Test 15:  0.084000
Test 16:  0.080000
Test 17:  0.118000
Test 18:  0.087000
Test 19:  0.070000
Average:  0.766000

Truffle seems to need a significant amount of ramp-up time, with the first three runs being abysmally slow, but then it dramatically picks up speed, leaving everything else in the dust by a factor 5-10.

Note: this is not 100% fair, since JRuby+Truffle does not yet support the full Ruby language.

Also note: this shows that simply taking the average over all runs is grossly misleading, since JRuby+Truffle comes out to the same average as YARV and JRuby, but actually has 7 times faster steady-state performance. The difference between the slowest run (run 1 of JRuby+Truffle) and the fastest run (run 20 also of JRuby+Truffle) is a wopping 100x.

Note #3: notice how the JRuby numbers all end with 000? That's because JRuby does not get easy access to the underlying OS's microsecond timer through the JVM and thus has to be content with milliseconds. It doesn't matter too much in this particular benchmark, but for faster benchmarks it can skew the results significantly. That's just another thing you have to consider when designing benchmarks.

Why is there such a huge difference? Is it because Ruby's operators are method calls and methods calls are slow or something?

I don't think so. On YARV, Fixnum#+ isn't even a method call, it is optimized to a static built-in operator. It essentially performs an in-register primitive integer add operation in the CPU. As fast as it gets.

YARV only falls back to treating it as a method call when you monkey-patch Fixnum.

Rubinius probably can optimize the method calls away, although I didn't check.

I feel like I'm doing something really wrong.

Probably, your benchmark doesn't measure what you think it does. In particular, I believe that on implementations with sophisticated optimizing compilers, the iteration part of your iteration benchmark may get optimized away.

Actually, I noticed a significant difference between your JavaScript and Ruby benchmarks: in JavaScript, you are using a primitive for loop, in Ruby, you are using Range#each (for … in just gets translated to each). If I switch both the Ruby and the JavaScript benchmarks over to an identical while loop, I get for the Ruby version: 223ms for YARV, 56ms for Rubinius, 28ms for JRuby, and 33ms for JRuby+Truffle. For the JS version: 30ms for Squirrelfish Extreme / Nitro (Safari), and 16ms for V8/Crankshaft (Chrome).

Or, in other words: if you measure the same thing, they end up equally fast ;-) (Well, except for YARV, which however is well-known to be slow anyways.)

So, as it turns out, the difference between Ruby and JavaScript was that in JS you weren't iterating anything, you were just incrementing a number, whereas in Ruby, you were actually iterating a datastructure (namely, a Range). Remove the iteration from Ruby, and it is as fast as JavaScript.

I have created two benchmark scripts that now hopefully roughly measure the same thing:

#!/usr/bin/env ruby

ITERATIONS = 10_000_000
TESTS = 20
WARMUP = 3
TOTALRUNS = TESTS + WARMUP
RESULTS = []

run = -1

while (run += 1) < TOTALRUNS
  i = -1
  starttime = Time.now

  while (i += 1) < ITERATIONS do end

  endtime = Time.now
  RESULTS[run] = (endtime - starttime) * 1000
end

puts RESULTS.drop(WARMUP).reduce(:+) / TESTS

"use strict";

const ITERATIONS = 10000000;

const TESTS = 20;

const WARMUP = 3;

const TOTALRUNS = TESTS + WARMUP;

const RESULTS = [];

let run = -1;

while (++run < TOTALRUNS) {

    let i = -1;

    const STARTTIME = Date.now();

    while (++i < ITERATIONS);

    const ENDTIME = Date.now();

    RESULTS[run] = ENDTIME - STARTTIME;

}

alert(RESULTS.slice(WARMUP).reduce((acc, el) => acc + el) / TESTS);

Why is Ruby string handling very slow?

The C++ and Ruby programs are not the same. The C++ program uses a char buffer for input - it "recycles" memory - while the Ruby program uses gets, which allocates new memory each time, which takes a toll on performance.

Also, the Ruby program uses an array to store the answer - and that array is being resized all the time! The C++ uses a single answer variable which it prints on each iteration - which is much faster.

Even if you change the programs to eliminate these two differences, the Ruby program will still be slower - but probably not by that much.

Why are Python and Ruby so slow, while Lisp implementations are fast?

Natively compiled Lisp systems are usually quite a bit faster than non-natively compiled Lisp, Ruby or Python implementations.

Definitions:

natively compiled -> compiles to machine code
compiled -> compiles to machine code or some other target (like byte code, JVM instructions, C code, ...)
interpreted Lisp -> runs s-expressions directly without compilation
interpreted Python -> runs compiled Python in a byte-code interpreter. The default Python implementation is not really interpreted, but using a compiler to a byte code instruction set. The byte code gets interpreted. Typically byte code interpreters are slower than execution of native code.

But keep in mind the following:

SBCL uses a native code compiler. It does not use a byte code machine or something like a JIT compiler from byte code to native code. SBCL compiles all code from source code to native code, before runtime. The compiler is incremental and can compile individual expressions. Thus it is used also by the EVAL function and from the Read-Eval-Print-Loop.
SBCL uses an optimizing compiler which makes use of type declarations and type inference. The compiler generates native code.
Common Lisp allows various optimizations which make the code less dynamic or not dynamic (inlining, early binding, no type checks, code specialized for declared types, tail-call optimizations, ...). Code which makes use of these advanced features can look complicated - especially when the compiler needs to be told about these things.
Without these optimizations compiled Lisp code is still faster than interpreted code, but slower than optimized compiled code.
Common Lisp provides CLOS, the Common Lisp Object System. CLOS code usually is slower than non-CLOS - where this comparison makes sense. A dynamic functional language tends to be faster than a dynamic object-oriented language.
If a language implementation uses a highly optimized runtime, for example for bignum arithmetic operations, a slow language implementation can be faster than an optimizing compiler. Some languages have many complex primitives implemented in C. Those tend to be fast, while the rest of the language can be very slow.
there can also be implementations of Python, which generate and run machine code, like the JIT compiler from PyPy. Ruby also now has a JIT compiler since Ruby 2.6.

Also some operations may look similar, but could be different. Is a for loop iterating over an integer variable really the same as a for loop which iterates over a range?

Ruby On Rails is slow...?

I'll agree with everyone else. You have to profile. There is no point in doing anything to your code until you know what specifically is causing the slowness. Trying to fixing a problem without understanding the cause is like feeling ill and deciding to have lots of surgery until you feel better. Diagnose your problem first. It might be something small like a network setting or it could be one bad line in your code.

Some tips for profiling:

How to Profile Your Rails Application

Performance Testing Rails Applications

At the Forge - Profiling Rails Applications

Once you have found the bottleneck you can figure out what to do.

I recommend these videos:
Railslab Scaling Rails

Revised now based on prof results:

OK. Now that you can see that your problem is that you are doing some sort of calculation using a query based on looping through the results of another active record query I'd advise you to look into building a custom SQL statement combining your initial selection criteria and the loop calculation to get what you need. You can definitely speed this up by optimizing the SQL.