Python, Ruby, Haskell - Do They Provide True Multithreading

Python, Ruby, Haskell - Do they provide true multithreading?

1) Do Python, Ruby, or Haskell support true multithreading?

This has nothing to do with the language. It is a question of the hardware (if the machine only has 1 CPU, it is simply physically impossible to execute two instructions at the same time), the Operating System (again, if the OS doesn't support true multithreading, there is nothing you can do) and the language implementation / execution engine.

Unless the language specification explicitly forbids or enforces true multithreading, this has absolutely nothing whatsoever to do with the language.

All the languages that you mention, plus all the languages that have been mentioned in the answers so far, have multiple implementations, some of which support true multithreading, some don't, and some are built on top of other execution engines which might or might not support true multithreading.

Take Ruby, for example. Here are just some of its implementations and their threading models:

MRI: green threads, no true multithreading
YARV: OS threads, no true multithreading
Rubinius: OS threads, true multithreading
MacRuby: OS threads, true multithreading
JRuby, XRuby: JVM threads, depends on the JVM (if the JVM supports true multithreading, then JRuby/XRuby does, too, if the JVM doesn't, then there's nothing they can do about it)
IronRuby, Ruby.NET: just like JRuby, XRuby, but on the CLI instead of on the JVM

See also my answer to another similar question about Ruby. (Note that that answer is more than a year old, and some of it is no longer accurate. Rubinius, for example, uses truly concurrent native threads now, instead of truly concurrent green threads. Also, since then, several new Ruby implementations have emerged, such as BlueRuby, tinyrb, Ruby Go Lightly, Red Sun and SmallRuby.)

Similar for Python:

CPython: native threads, no true multithreading
PyPy: native threads, depends on the execution engine (PyPy can run natively, or on top of a JVM, or on top of a CLI, or on top of another Python execution engine. Whenever the underlying platform supports true multithreading, PyPy does, too.)
Unladen Swallow: native threads, currently no true multithreading, but fix is planned
Jython: JVM threads, see JRuby
IronPython: CLI threads, see IronRuby

For Haskell, at least the Glorious Glasgow Haskell Compiler supports true multithreading with native threads. I don't know about UHC, LHC, JHC, YHC, HUGS or all the others.

For Erlang, both BEAM and HiPE support true multithreading with green threads.

2) If a program contains threads, will a Virtual Machine automatically assign work to multiple cores (or to physical CPUs if there is more than 1 CPU on the mainboard)?

Again: this depends on the Virtual Machine, the Operating System and the hardware. Also, some of the implementations mentioned above, don't even have Virtual Machines.

Does ruby have real multithreading?

Updated with Jörg's Sept 2011 comment

You seem to be confusing two very different things here: the
Ruby Programming Language and the specific threading model of one
specific implementation of the Ruby Programming Language. There
are currently around 11 different implementations of the Ruby
Programming Language, with very different and unique threading
models.

(Unfortunately, only two of those 11 implementations are actually
ready for production use, but by the end of the year that number
will probably go up to four or five.) (Update: it's now 5: MRI, JRuby, YARV (the interpreter for Ruby 1.9), Rubinius and IronRuby).

The first implementation doesn't actually have a name, which
makes it quite awkward to refer to it and is really annoying and
confusing. It is most often referred to as "Ruby", which is even
more annoying and confusing than having no name, because it
leads to endless confusion between the features of the Ruby
Programming Language and a particular Ruby Implementation.
It is also sometimes called "MRI" (for "Matz's Ruby
Implementation"), CRuby or MatzRuby.
MRI implements Ruby Threads as Green Threads within its
interpreter. Unfortunately, it doesn't allow those threads
to be scheduled in parallel, they can only run one thread at a
time.
However, any number of C Threads (POSIX Threads etc.) can run
in parallel to the Ruby Thread, so external C Libraries, or MRI
C Extensions that create threads of their own can still run in
parallel.
The second implementation is YARV (short for "Yet
Another Ruby VM"). YARV implements Ruby Threads as POSIX or
Windows NT Threads, however, it uses a Global Interpreter
Lock (GIL) to ensure that only one Ruby Thread can actually be
scheduled at any one time.
Like MRI, C Threads can actually run parallel to Ruby Threads.
In the future, it is possible, that the GIL might get broken
down into more fine-grained locks, thus allowing more and more
code to actually run in parallel, but that's so far away, it is
not even planned yet.
JRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the JVM obviously means "JVM
Threads". JRuby imposes no additional locking on them. So,
whether those threads can actually run in parallel depends on
the JVM: some JVMs implement JVM Threads as OS Threads and some
as Green Threads. (The mainstream JVMs from Sun/Oracle use exclusively OS threads since JDK 1.3)
XRuby also implements Ruby Threads as JVM Threads. Update: XRuby is dead.
IronRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the CLR obviously means
"CLR Threads". IronRuby imposes no additional locking on them,
so, they should run in parallel, as long as your CLR supports
that.
Ruby.NET also implements Ruby Threads as CLR
Threads. Update: Ruby.NET is dead.
Rubinius implements Ruby Threads as Green Threads
within its Virtual Machine. More precisely: the Rubinius
VM exports a very lightweight, very flexible
concurrency/parallelism/non-local control-flow construct, called
a "Task", and all other concurrency constructs (Threads in
this discussion, but also Continuations, Actors and
other stuff) are implemented in pure Ruby, using Tasks.
Rubinius can not (currently) schedule Threads in parallel,
however, adding that isn't too much of a problem: Rubinius can
already run several VM instances in several POSIX Threads in
parallel, within one Rubinius process. Since Threads are
actually implemented in Ruby, they can, like any other Ruby
object, be serialized and sent to a different VM in a different
POSIX Thread. (That's the same model the BEAM Erlang VM
uses for SMP concurrency. It is already implemented for
Rubinius Actors.)
Update: The information about Rubinius in this answer is about the Shotgun VM, which doesn't exist anymore. The "new" C++ VM does not use green threads scheduled across multiple VMs (i.e. Erlang/BEAM style), it uses a more traditional single VM with multiple native OS threads model, just like the one employed by, say, the CLR, Mono, and pretty much every JVM.
MacRuby started out as a port of YARV on top of the
Objective-C Runtime and CoreFoundation and Cocoa Frameworks. It
has now significantly diverged from YARV, but AFAIK it currently
still shares the same Threading Model with YARV.
Update: MacRuby depends on apples garbage collector which is declared deprecated and will be removed in later versions of MacOSX, MacRuby is undead.
Cardinal is a Ruby Implementation for the Parrot
Virtual Machine. It doesn't implement threads yet, however,
when it does, it will probably implement them as Parrot
Threads. Update: Cardinal seems very inactive/dead.
MagLev is a Ruby Implementation for the GemStone/S
Smalltalk VM. I have no information what threading model
GemStone/S uses, what threading model MagLev uses or even if
threads are even implemented yet (probably not).
HotRuby is not a full Ruby Implementation of its
own. It is an implementation of a YARV bytecode VM in
JavaScript. HotRuby doesn't support threads (yet?) and when it
does, they won't be able to run in parallel, because JavaScript
has no support for true parallelism. There is an ActionScript
version of HotRuby, however, and ActionScript might actually
support parallelism. Update: HotRuby is dead.

Unfortunately, only two of these 11 Ruby Implementations are
actually production-ready: MRI and JRuby.

So, if you want true parallel threads, JRuby is currently your
only choice – not that that's a bad one: JRuby is actually faster
than MRI, and arguably more stable.

Otherwise, the "classical" Ruby solution is to use processes
instead of threads for parallelism. The Ruby Core Library
contains the Process module with the Process.fork
method which makes it dead easy to fork off another Ruby
process. Also, the Ruby Standard Library contains the
Distributed Ruby (dRuby / dRb) library, which allows Ruby
code to be trivially distributed across multiple processes, not
only on the same machine but also across the network.

Haskell lightweight threads overhead and use on multicores

GHC's runtime provides an execution environment supporting billions of sparks, thousands of lightweight threads, which may be distributed over multiple hardware cores. Compile with -threaded and use the +RTS -N4 flags to set your desired number of cores.

sparks/threads/workers/cores

Specifically:

does this mean that creating a lot of them (like 1000) will not have a drastic impact on performance?

Well, creating 1,000,000 of them is certainly possible. 1000 is so cheap it won't even show up. You can see in thread creation benchmarks, such as "thread ring" that GHC is very, very good.

Doesn't the concept of lightweight threads prevent us from using the benefints of multicore architectures?

Not at all. GHC has been running on multicores since 2004. The current status of the multicore runtime is tracked here.

How does it do it? The best place to read up on this architecture is in the paper, "Runtime Support for Multicore Haskell":

The GHC runtime system supports millions of lightweight threads
by multiplexing them onto a handful of operating system threads,
roughly one for each physical CPU. ...

Haskell threads are executed by a set of operating system
threads, which we call worker threads. We maintain roughly one
worker thread per physical CPU, but exactly which worker thread
may vary from moment to moment ...

Since the worker thread may change, we maintain exactly one
Haskell Execution Context (HEC) for each CPU. The HEC is a
data structure that contains all the data that an OS worker thread
requires in order to execute Haskell threads

You can monitor your threads being created, and where they're executing, via threadscope.. Here, e.g. running the binary-trees benchmark:

threadscope

Which scripting languages support multi-core programming?

Thread syntax may be static, but implementation across operating systems and virtual machines may change

Your scripting language may use true threading on one OS and fake-threads on another.

If you have performance requirements, it might be worth looking to ensure that the scripted threads fall through to the most beneficial layer in the OS. Userspace threads will be faster, but for largely blocking thread activity kernel threads will be better.

can multiple threads can be implemented in every core in python

@huhnmonster's answer is basically correct: you want the multiprocessing package, rather than the multithreading one.

This is actually fairly easy to do with the Pool construct:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

The number of processes you want to select depends on both the number of cores, the frequency of I/O requests for each process, and the number of threads of execution each core can support at once.

For example, most Intel Core i7 processors have 4 cores, but enough on-chip registers to store context for 2 processes and execute some operations in parallel. As a rule of thumb, you don't get twice the execution speed from this, but you often see 25-30% speedups. So if you were running your code on a Core i7, you'd want 8 running processes at a time. If your code blocks on I/O about half the time (you can use a profiler to estimate this), then a good guess at the correct number of processes would be 12-16. A little bit of experimentation would let you figure out the right number pretty quickly.

How does multithreading utilizes multiple cores?

Your confusion lies here:

[...] while one process is running under one CPU core.
[...] threads created by one process should run only under that specific process, which means that it should only run under that very one CPU core.

This is not true. I think what the various explanations you have read meant that any process have at least one thread (where a 'thread' is a sequence of instructions ran by a CPU core).

If you have a multithreaded program, the process will have several threads (sequences of instructions ran by a CPU core) that can run concurrently on different CPU cores.

There are many processes executing on your computer at any given time. The Operating System (OS) is the program that allocates the hardware resources (CPU cores) to all these processes and decides which process can use which cores for what amount of time before another process gets to use the CPU. Whether or not a process gets to use multiple cores is not entirely up to the process. More confusing still, multithreaded programs can use more threads than there are cores on the computer's CPU. In that case you can be certain that all your threads do not run in parallel.

One more thing:

[...] threads utilizes multiple cores and make the whole program executes more effective

I am going to sound very pedantic, but it is more complicated than that. It depends on what you mean by "effective". Are we talking about total computation time, energy consumption ..?

A sequential (1 thread) program may be very effective in terms of power consumption but taking a very long time to compute. If you are able to use multiple threads, you may be able to reduce that computation time but it will probably incur new costs (synchronization between threads, additional protection mechanisms against concurrent accesses ...).

Also, multithreading cannot help for certain tasks that fall outside of the CPU realm. For example, unless you have some very specific hardware support, reading a file from the hard-drive with 2 or more concurrent threads cannot be parallelized efficiently.

Are thread pools needed for pure Haskell code?

The core issue, I imagine, is the network side. If you have 10,000 links and forkIO for each link, then you potentially have 10,000 sockets you're attempting to open at once, which, depending on how your OS is configured, probably won't even be possible, much less efficient.

However, the fact that we have green threads that get "virtually" scheduled across multiple os threads (which ideally are stuck to individual cores) doesn't mean that we can just distribute work randomly without regards to cpu usage either. The issue here isn't so much that the scheduling of the CPU itself won't be handled for us, but rather that context-switches (even green ones) cost cycles. Each thread, if its working on different data, will need to pull that data into the cpu. If there's enough data, that means pulling things in and out of the cpu cache. Even absent that, it means pulling things from the cache to registers, etc.

Even if a problem is trivially parallel, it is virtually never the right idea to just break it up as small as possible and attempt to do it "all at once".

Python, Ruby, Haskell - Do They Provide True Multithreading