Why doesn't Ruby have a ThreadPool built-in?
Most likely the reason is because ruby doesn't have "real" threads. It has what are called Green threads. The ruby interpreter takes care of scheduling execution threads without using any underlying OS threads. This effectively makes Ruby single threaded.
Does Ruby have any construct similar to Clojure's pmap for parallel processing?
Here's a simple little example of one way to do this. Note that there's nothing limiting the number of threads it creates at once, so you might want to create some sort of thread pool if you're running lots of threads.
[1,2,3].map{|x| Thread.start{x+1}}.map{|t| t.join.value}
Does Rails 3 have an equivalent to merb's run_later function built-in?
For running things at a later time, delayed_job is the community standard. I would recommend learning how to use this and apply it to your situation.
How to make an HTTP request without waiting for the response in Ruby
It's possible by opening a socket and closing it. This will establish a connection and close the connection without transferring any data within the connection's context...
...you will need to wait for the connection to open - although there might be a way to avoid that.
require 'socket'
# opens a connection, will wait for connection but not for data.
s = TCPSocket.new 'host.url.com', 80
# closes the connection
s.close
It's probably the equivalent of a ping and will not open a new thread... although, it's not totally asynchronous.
With an HTTP request, the code might look something like this:
require 'socket'
host = 'www.google.com'
# opens a connection, will wait for connection but not for data.
s = TCPSocket.new host, 80
# send a GET request for '/' .
s.puts "GET / HTTP/1.1\r\nHost: #{host}\r\n\r\n"
# closes the connection
s.close
You can search for more info about HTTP requests on stack exchange and get some ideas, like here.
Just to clarify (due to comments):
This WILL introduce the latency related to establishing the connection (and sending the request), but you WILL NOT have to wait for the reply to be processed and received.
Dropping the connection (closing your half of the socket) could have any of the following effects - all of which are assuming a decent web server:
if
s.close
is completed BEFORE the response is fully sent by the web server, the web server will first process the request and then an exception will be raised on the web-server's socket when it tries to send the data. The web server should then close the socket and release any resources.if
s.close
is completed AFTER the response is fully sent by the web server, then the server might either: 1. close the socket immediately (normal HTTP 1 behavior) OR 2. keep the connection alive until a timeout occurs (optional HTTP 1.1 behavior) - Timeout is usually about 10 seconds.
Hitting the web server repeatedly in very small intervals might cause a DOS security flag to be raised and future connections to be blocked (this is true regardless of how you hit the web server).
I would probably opt to use a worker thread, like so:
I do believe that running a separate thread might not be as expensive as you believe. It's possible to have one thread cycle for all the asynchronous web requests.
here's a thought:
require 'socket'
REQUESTS_MUTEX = Mutex.new
REQUESTS_QUE = []
REQUESTS_THREAD = Thread.new do
begin
loop do
sleep 0.5 while REQUESTS_QUE.empty?
host, path = REQUESTS_MUTEX.synchronize {REQUESTS_QUE.shift}
# the following will open a connection and start a request,
# but it's easier to use the built in HTTP API...
# although it will wait for a response.
s = TCPSocket.new host, 80
s.puts "GET #{path} HTTP/1.1\r\nHost: #{host}\r\n\r\n"
s.close
# log here:
puts "requested #{path} from #{host}."
end
rescue Exception => e
retry
end
end
def asynch_request host, path = '/'
REQUESTS_MUTEX.synchronize {REQUESTS_QUE << [host, path]}
REQUESTS_THREAD.alive?
end
Now, for every request, you can simply call asynch_request
and the cycling thread should hit the web server as soon as it wakes up and notices the que.
You can test it from the terminal by pasting a few requests in a bunch:
asynch_request 'www.google.com'
asynch_request 'www.yahoo.com'
asynch_request 'I.Dont.exist.com'
asynch_request 'very bad host address...'
asynch_request 'www.github.com'
Notice the silent fails (you can tweak the code).
run haskell operations in parallel or multithreaded
Parallel and Concurrent Programming in Haskell has a lot of good information, and async is a good library for this stuff.
At the bottom level though, you'll find forkIO
to start a new lightweight thread.
Of course that's concurrency, not deterministic parallelism, parallel is the library for that, and also covered in the book.
Your example translates to:
import Data.Time.Clock (getCurrentTime)
main = do
start <- getCurrentTime
putStr "Started At " >> print start
_ <- forkIO func1
_ <- forkIO func2
end <- getCurrentTime
putStr "End at " >> print end
func1 = helper "func1" 2
func2 = helper "func2" 1
helper name sleepTime = go 0
where
go 3 = return ()
go n = do
now <- getCurrentTime
putStr name >> putStr " at: " >> print now
threadDelay sleepTime
go $ succ n
I recommend learning the parallel and/or async libraries mentioned above, though, instead of writing your own threading stuff, at least initially.
Here's a not-so-great example of running tests on 8-ish processors using parallel:
import Control.Parallel.Strategies
factorial = product . enumFromTo 1
pairs (lower, upper) = map fst . filter snd . withStrategy sparkTest
$ [ ((m, n), b)
| m <- [lower..upper]
, n <- [lower..upper]
, let b = 1 + factorial n == m*m
]
sparkTest = evalBuffer 8 $ evalTuple2 rseq rpar
Which scripting languages support multi-core programming?
Thread syntax may be static, but implementation across operating systems and virtual machines may change
Your scripting language may use true threading on one OS and fake-threads on another.
If you have performance requirements, it might be worth looking to ensure that the scripted threads fall through to the most beneficial layer in the OS. Userspace threads will be faster, but for largely blocking thread activity kernel threads will be better.
How to speed up Ruby script? Or shell script alternative?
Here is one of your big problems:
if line != ""
File.open(file_out, 'a') { |f| f.puts(line) }
end
So your program needs to open and close the output file millions of times because it is doing that for every single line. Each time it opens it, since it is being opened in append mode, your system might have to do a lot of work to find the end of the file.
You should really change your program to open the output file once at the beginning and only close it at the end. Also, run strace
to see what your Ruby I/O operations are doing behind the scenes; it should buffer up the writes and then send them to the OS in blocks of about 4 kilobytes at a time; it shouldn't issue a write
system call for every single line.
To further improve the performance, you should use a Ruby profiling tool to see which functions are taking the most time.
Related Topics
Ruby: How to Pass All Parameters and Blocks Received by One Method to Another
Can Not Install JSON Gem with Ruby 2.2.3 on Ubuntu
Does Scala Scale Better Than Other Jvm Languages
How to Handle Exceptions with Ruby Rest-Client
How to Use an Overridden Constant in an Inheritanced Class
Running Ruby Unit Tests with Rake
What's the Most Efficient Way to Deep Copy an Object in Ruby
Should I Use Class Method or Instance Method, and Why
Rails - Testing JSON API with Functional Tests
What's the Difference Between a Class and the Singleton of That Class in Ruby
Get Substring After the First = Symbol in Ruby
Rails: Serializing Objects in a Database
Ruby Koans: Why Convert List of Symbols to Strings
How to Check from Ruby Whether a Process with a Certain Pid Is Running
Cool Tricks and Expressive Snippets with Ruby Collections/Enumerables