What Blocks Ruby, Python to Get JavaScript V8 Speed

What blocks Ruby, Python to get Javascript V8 speed?

What blocks Ruby, Python to get Javascript V8 speed?

Nothing.

Well, okay: money. (And time, people, resources, but if you have money, you can buy those.)

V8 has a team of brilliant, highly-specialized, highly-experienced (and thus highly-paid) engineers working on it, that have decades of experience (I'm talking individually – collectively it's more like centuries) in creating high-performance execution engines for dynamic OO languages. They are basically the same people who also created the Sun HotSpot JVM (among many others).

Lars Bak, the lead developer, has been literally working on VMs for 25 years (and all of those VMs have lead up to V8), which is basically his entire (professional) life. Some of the people writing Ruby VMs aren't even 25 years old.

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Given that at least IronRuby, JRuby, MagLev, MacRuby and Rubinius have either monomorphic (IronRuby) or polymorphic inline caching, the answer is obviously no.

Modern Ruby implementations already do a great deal of optimizations. For example, for certain operations, Rubinius's Hash class is faster than YARV's. Now, this doesn't sound terribly exciting until you realize that Rubinius's Hash class is implemented in 100% pure Ruby, while YARV's is implemented in 100% hand-optimized C.

So, at least in some cases, Rubinius can generate better code than GCC!

Or this is rather matter of resources put into the V8 project by Google.

Yes. Not just Google. The lineage of V8's source code is 25 years old now. The people who are working on V8 also created the Self VM (to this day one of the fastest dynamic OO language execution engines ever created), the Animorphic Smalltalk VM (to this day one of the fastest Smalltalk execution engines ever created), the HotSpot JVM (the fastest JVM ever created, probably the fastest VM period) and OOVM (one of the most efficient Smalltalk VMs ever created).

In fact, Lars Bak, the lead developer of V8, worked on every single one of those, plus a few others.

Performance of Ruby's own methods

As @theTinMan alluded to in the comments, you must understand the distinction between the language (syntax) and the logic (semantics). For example, suppose someone asked you to write a program that prints the number 1,000. You'd probably write it like this:

puts 1000

But you could also write any of these:

puts 1_000
puts 0b1111101000
puts 01750

These are all the same. Not "the same" as in they produce the same results, but "the same" as in Ruby parses and executes them exactly the same way. Their syntaxes are different but their semantics are identical.

The same is true for Ruby's different array syntaxes (and for its equivalent string syntaxes, Regexp literals, etc.). You can test this yourself using Ruby's --dump insns (dump instruction sequence) option:

$ ruby --dump insns -e 'arr = ["a", "b"]'
== disasm: <RubyVM::InstructionSequence:<main>@-e>======================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] arr
0000 trace            1                                               (   1)
0002 putstring        "a"
0004 putstring        "b"
0006 newarray         2
0008 dup
0009 setlocal_OP__WC__0 2
0011 leave

$ ruby --dump insns -e 'arr = %w(a b)'
== disasm: <RubyVM::InstructionSequence:<main>@-e>======================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] arr
0000 trace            1                                               (   1)
0002 putstring        "a"
0004 putstring        "b"
0006 newarray         2
0008 dup
0009 setlocal_OP__WC__0 2
0011 leave

Completely identical. The salient instructions, of course, are 0002–0006:

0002 putstring        "a"
0004 putstring        "b"
0006 newarray         2

These instructions say (more or less):

(0002) Push the string "a" onto the top of the stack.
(0004) Push the string "b" onto the top of the stack.
(0006) Pop the top two values from the stack, make an array from them and push it onto the stack.

These are the actual instructions that the MRI VM will execute in both cases. Ruby never knows that you used %w( ... ) instead of [ ... ] and there's no additional code it has to execute.

Does different language = different performance in couchDB lists?

You can use the V8 engine if you want for Couch. A guy from IrisCouch wrote couchjs to do this (I've seen him on Stack Overflow quite a bit too).

https://github.com/iriscouch/couchjs

Also for views, filtered replication, things like that, you can write the functions in Erlang instead of javascript. I've done that and seen around a 50% performance increase.

Seems you can write list functions in Erlang: http://tisba.de/2010/11/25/native-list-functions-with-couchdb/

Convert a partial to method/block for speed

Here is an example from one of my previous answers. It's extracted from PartialRenderer sources.

- local_names = [:i]
- partials = {}
- 1000.times do |i|
  - name = 'name_%s' % (i % 10)
  - partials[name] ||= lookup_context.find_template(name, lookup_context.prefixes, true, local_names)
  = partials[name].render(self, i: i)

I'd recommend you to wrap it with a helper method. Keep in mind that locals' names appear here twice: first in local_names as an array and second in hash's keys passed as the second argument of #render method.

Python vs. Ruby for metaprogramming

There's not really a huge difference between python and ruby at least at an ideological level. For the most part, they're just different flavors of the same thing. Thus, I would recommend seeing which one matches your programming style more.

Interpreting a benchmark in C, Clojure, Python, Ruby, Scala and others

Rough answers:

Scala's static typing is helping it quite a bit here - this means that it uses the JVM pretty efficiently without too much extra effort.
I'm not exactly sure on the Ruby/Python difference, but I suspect that (2...n).all? in the function is-prime? is likely to be quite well optimised in Ruby (EDIT: sounds like this is indeed the case, see Julian's answer for more detail...)
Ruby 1.9.3 is just much better optimised
Clojure code can certainly be accelerated a lot! While Clojure is dynamic by default, you can use type hints, primitive maths etc. to get close to Scala / pure Java speed in many cases when you need to.

Most important optimisation in the Clojure code would be to use typed primitive maths within is-prime?, something like:

(set! *unchecked-math* true) ;; at top of file to avoid using BigIntegers

(defn ^:static is-prime? [^long n]
  (loop [i (long 2)] 
    (if (zero? (mod n i))
      false
      (if (>= (inc i) n) true (recur (inc i))))))

With this improvement, I get Clojure completing 10k in 0.635 secs (i.e. the second fastest on your list, beating Scala)

P.S. note that you have printing code inside your benchmark in some cases - not a good idea as it will distort the results, especially if using a function like print for the first time causes initialisation of IO subsystems or something like that!

Dynamic .NET language performance?

IronPython and IronRuby are built on top of the DLR -- dynamic language runtime -- and are compiled to CIL (the bytecode used by .NET) on the fly. They're slower than C# but faaaaaaar faster than their non-.NET counterparts. There aren't any decent benchmarks out there, to my knowledge, but you'll see the difference.

Are loops really faster in reverse?

It's not that i-- is faster than i++. Actually, they're both equally fast.

What takes time in ascending loops is evaluating, for each i, the size of your array. In this loop:

for(var i = array.length; i--;)

You evaluate .length only once, when you declare i, whereas for this loop

for(var i = 1; i <= array.length; i++)

you evaluate .length each time you increment i, when you check if i <= array.length.

In most cases you shouldn't even worry about this kind of optimization.

Do comments affect performance?

Am I correct to say that JavaScript code isn't compiled, not even JIT?

No. Although JavaScript is traditionally an "interpreted" language (although it needn't necessarily be), most JavaScript engines compile it on-the-fly whenever necessary. V8 (the engine in Chrome and NodeJS) used to compile immediately and quickly, then go back and aggressively optimize any code that was used a lot (the old FullCodegen+TurboFan stack); a while back having done lots of real-world measurement, they switched to initially parsing to byteocde and interpreting, and then compiling if code is reused much at all (the new Ignition+TurboFan stack), gaining a significant memory savings by not compiling run-once setup code. Even engines that are less aggressive almost certainly at least parse the text into some form of bytecode, discarding comments early.

Remember that "interpreted" vs. "compiled" is usually more of an environmental thing than a language thing; there are C interpreters, and there are JavaScript compilers. Languages tend to be closely associated with environments (like how JavaScript tends to be associated with the web browser environment, even though it's always been used more widely than that, even back in 1995), but even then (as we've seen), there can be variation.

If so, does that mean that comments have an affect on performance...

A very, very, very minimal one, on the initial parsing stage. But comments are very easy to scan past, nothing to worry about.

If you're really worried about it, though, you can minify your script with tools like jsmin or the Closure Compiler (even with just simple optimizations). The former will just strip comments and unnecessary whitespace, stuff like that (still pretty effective); the latter does that and actually understands the code and does some inlining and such. So you can comment freely, and then use those tools to ensure that whatever minuscule impact those comments may have when the script is first loaded is bypassed by using minifying tools.

Of course, the thing about JavaScript performance is that it's hard to predict reliably cross-engine, because the engines vary so much. So experiments can be fun:

Here's an experiment which (in theory) reparses/recreates the function every time
Here's one that just parses/creates the function once and reuses it

Result? My take is that there's no discernable difference within the measurement error of the test.

What Blocks Ruby, Python to Get JavaScript V8 Speed