How to Speed Up Ruby/Rake Task

How can I speed up my Ruby/Rake task, which counts occurrences of dates among 300K date strings?

Yes, you don't need to parse the dates at all if they are formatted the same. Knowing your data is one of the most powerful tools you can have.

If the datetime strings are all in the same format (yyyy-mm-dd HH:MM:SS) then you could do something like

data_array.group_by{|datetime| datetime[0..9]}

This will give you a hash like with the date strings as the keys and the array of dates as values

{
"2007-05-06" => [...],
"2007-05-07" => [...],
...
}

So you'd have to get the length of each array

data_array.group_by{|datetime| datatime[0..9]}.each do |date_string, date_array|
puts "#{date_string} occurred #{date_array.length} times."
end

Of course that method is wasting memory by arrays of dates when you don't need them.

so how about

A more memory-efficient method

date_counts = {}
date_array.each do |date_string|
date = date_string[0..9]
date_counts[date] ||= 0 # initialize count if necessary
date_counts[date] += 1
end

You'll end up with a hash with the date strings as the keys and the counts as values

{
"2007-05-06" => 123,
"2007-05-07" => 456,
...
}

Putting everything together

date_counts = {}
date_array.each do |date_string|
date = date_string[0..9]
date_counts[date] ||= 0 # initialize count if necessary
date_counts[date] += 1
end

Date.parse('2007-03-23').upto Date.parse('2011-10-06') do |date_to_count|
puts "#{date_to_count} occurred #{date_counts[date_to_count.to_s].to_i} times."
end

How to diagnose slow rails / rake / rspec tasks

Thanks to @MaxWilliams for the link to this post How do I debug a slow rails app boot time?

I started using Mark Ellul's Bumbler - http://github.com/mark-ellul/Bumbler

It gave me exactly what I wanted - an insight into what's going in the background and which gems are taking the time. Of course I still need to speed up the slow ones (fog and authlogic seem to be two of the main culprits). But that's at different question.

How do I output performance times for rake tasks

There is a simple benchmarking library in Ruby's Stdlib:

require 'benchmark'

puts Benchmark.measure { "a"*1_000_000 }

You could drop that in your rake tasks, as for an automatic "benchmark all rake task executions", that would take a little digging into the innards of rake.

More info at: http://ruby-doc.org/stdlib/libdoc/benchmark/rdoc/index.html

Ruby rake tasks thread optimization

When number of such files is low, you do not care for order of execution and can afford some extra memory - simpliest solution is just to run them in different processes by cron (for example - gem 'whenever').

If there're more - use some http gems for parallel downloading - typhoeus, curb, em-http-request etc



Related Topics



Leave a reply



Submit