Why Doesn't Ruby Have a Threadpool Built-In

Why doesn't Ruby have a ThreadPool built-in?

Most likely the reason is because ruby doesn't have "real" threads. It has what are called Green threads. The ruby interpreter takes care of scheduling execution threads without using any underlying OS threads. This effectively makes Ruby single threaded.

Does Ruby have any construct similar to Clojure's pmap for parallel processing?

Here's a simple little example of one way to do this. Note that there's nothing limiting the number of threads it creates at once, so you might want to create some sort of thread pool if you're running lots of threads.

[1,2,3].map{|x| Thread.start{x+1}}.map{|t| t.join.value}

Does Rails 3 have an equivalent to merb's run_later function built-in?

For running things at a later time, delayed_job is the community standard. I would recommend learning how to use this and apply it to your situation.

How to make an HTTP request without waiting for the response in Ruby

It's possible by opening a socket and closing it. This will establish a connection and close the connection without transferring any data within the connection's context...

...you will need to wait for the connection to open - although there might be a way to avoid that.

require 'socket'
# opens a connection, will wait for connection but not for data.
s = TCPSocket.new 'host.url.com', 80
# closes the connection
s.close

It's probably the equivalent of a ping and will not open a new thread... although, it's not totally asynchronous.

With an HTTP request, the code might look something like this:

require 'socket'
host = 'www.google.com'
# opens a connection, will wait for connection but not for data.
s = TCPSocket.new host, 80
# send a GET request for '/' .
s.puts "GET / HTTP/1.1\r\nHost: #{host}\r\n\r\n"
# closes the connection
s.close

You can search for more info about HTTP requests on stack exchange and get some ideas, like here.

Just to clarify (due to comments):

This WILL introduce the latency related to establishing the connection (and sending the request), but you WILL NOT have to wait for the reply to be processed and received.

Dropping the connection (closing your half of the socket) could have any of the following effects - all of which are assuming a decent web server:

if s.close is completed BEFORE the response is fully sent by the web server, the web server will first process the request and then an exception will be raised on the web-server's socket when it tries to send the data. The web server should then close the socket and release any resources.
if s.close is completed AFTER the response is fully sent by the web server, then the server might either: 1. close the socket immediately (normal HTTP 1 behavior) OR 2. keep the connection alive until a timeout occurs (optional HTTP 1.1 behavior) - Timeout is usually about 10 seconds.

Hitting the web server repeatedly in very small intervals might cause a DOS security flag to be raised and future connections to be blocked (this is true regardless of how you hit the web server).

I would probably opt to use a worker thread, like so:

I do believe that running a separate thread might not be as expensive as you believe. It's possible to have one thread cycle for all the asynchronous web requests.

here's a thought:

require 'socket'

REQUESTS_MUTEX = Mutex.new
REQUESTS_QUE = []
REQUESTS_THREAD = Thread.new do
   begin
      loop do
         sleep 0.5 while REQUESTS_QUE.empty?
         host, path = REQUESTS_MUTEX.synchronize {REQUESTS_QUE.shift}
         # the following will open a connection and start a request,
         # but it's easier to use the built in HTTP API...
         # although it will wait for a response. 
         s = TCPSocket.new host, 80
         s.puts "GET #{path} HTTP/1.1\r\nHost: #{host}\r\n\r\n"
         s.close
         # log here: 
         puts "requested #{path} from #{host}."
      end
   rescue Exception => e
      retry
   end
end
def asynch_request host, path = '/'
   REQUESTS_MUTEX.synchronize {REQUESTS_QUE << [host, path]}
   REQUESTS_THREAD.alive?
end

Now, for every request, you can simply call asynch_request and the cycling thread should hit the web server as soon as it wakes up and notices the que.

You can test it from the terminal by pasting a few requests in a bunch:

asynch_request 'www.google.com'
asynch_request 'www.yahoo.com'
asynch_request 'I.Dont.exist.com'
asynch_request 'very bad host address...'
asynch_request 'www.github.com'

Notice the silent fails (you can tweak the code).

run haskell operations in parallel or multithreaded

Parallel and Concurrent Programming in Haskell has a lot of good information, and async is a good library for this stuff.

At the bottom level though, you'll find forkIO to start a new lightweight thread.

Of course that's concurrency, not deterministic parallelism, parallel is the library for that, and also covered in the book.

Your example translates to:

import Data.Time.Clock (getCurrentTime)

main = do
    start <- getCurrentTime
    putStr "Started At " >> print start
    _ <- forkIO func1
    _ <- forkIO func2
    end <- getCurrentTime
    putStr "End at " >> print end

func1 = helper "func1" 2

func2 = helper "func2" 1

helper name sleepTime = go 0
 where
    go 3 = return ()
    go n = do
        now <- getCurrentTime
        putStr name >> putStr " at: " >> print now
        threadDelay sleepTime
        go $ succ n

I recommend learning the parallel and/or async libraries mentioned above, though, instead of writing your own threading stuff, at least initially.

Here's a not-so-great example of running tests on 8-ish processors using parallel:

import Control.Parallel.Strategies

factorial = product . enumFromTo 1
pairs (lower, upper) =  map fst . filter snd . withStrategy sparkTest
                     $ [ ((m, n), b)
                       | m <- [lower..upper]
                       , n <- [lower..upper]
                       , let b = 1 + factorial n == m*m
                       ]
sparkTest = evalBuffer 8 $ evalTuple2 rseq rpar

Which scripting languages support multi-core programming?

Thread syntax may be static, but implementation across operating systems and virtual machines may change

Your scripting language may use true threading on one OS and fake-threads on another.

If you have performance requirements, it might be worth looking to ensure that the scripted threads fall through to the most beneficial layer in the OS. Userspace threads will be faster, but for largely blocking thread activity kernel threads will be better.

How to speed up Ruby script? Or shell script alternative?

Here is one of your big problems:

if line != ""
  File.open(file_out, 'a') { |f| f.puts(line) }
end

So your program needs to open and close the output file millions of times because it is doing that for every single line. Each time it opens it, since it is being opened in append mode, your system might have to do a lot of work to find the end of the file.

You should really change your program to open the output file once at the beginning and only close it at the end. Also, run strace to see what your Ruby I/O operations are doing behind the scenes; it should buffer up the writes and then send them to the OS in blocks of about 4 kilobytes at a time; it shouldn't issue a write system call for every single line.

To further improve the performance, you should use a Ruby profiling tool to see which functions are taking the most time.

Why Doesn't Ruby Have a Threadpool Built-In