Ruby - Read file in batches
there's no universal way.
1) you can read file by chunks:
File.open('filename','r') do |f|
chunk = f.read(2048)
...
end
disadvantage: you can miss a substring if it'd be between chunks, i.e. you look for "SOME_TEXT", but "SOME_" is a last 5 bytes of 1st 2048-byte chunk, and "TEXT" is a 4 bytes of 2nd chunk
2) you can read file line-by-line
File.open('filename','r') do |f|
line = f.gets
...
end
disadvantage: this way it'd be 2x..5x slower than first method
Read a file in chunks in Ruby
Adapted from the Ruby Cookbook page 204:
FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end
Disclaimer: I'm a ruby newbie and haven't tested this.
Import CSV in batches of lines in Rails?
Try AR Import
Old answer
Have you tried to use AR Extensions for bulk import?
You get impressive performance improvements when you are inserting 1000's of rows to DB.
Visit their website for more details.
optimizing reading database and writing to csv file
The problem here is that when you call emails.each
ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each
:
require 'csv'
BATCH_SIZE = 5000
def write_rows(emails)
CSV.open(file_path, 'w') do |csv|
csv << %w{email name ip created}
emails.find_each do |email|
csv << [email.email, email.name, email.ip, email.created_at]
end
end
end
By default find_each
loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size
to find_each
:
emails.find_each(:batch_size => 5000) do |email|
...
More information about the find_each
method (and the related find_in_batches
) can be found on the Ruby on Rails Guides.
I've used the CSV
class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.
reading large csv files in a rails app takes up a lot of memory - Strategy to reduce memory consumption?
You can make use of CSV.foreach
to read just chunks of your CSV file:
path = Rails.root.join('data/uploads/.../upload.csv') # or, whatever
CSV.foreach(path) do |row|
# process row[i] here
end
If it's run in a background job, you could additionally call GC.start
every n rows.
How it works
CSV.foreach
operates on an IO stream, as you can see here:
def IO.foreach(path, options = Hash.new, &block)
# ...
open(path, options) do |csv|
csv.each(&block)
end
end
The csv.each
part is a call to IO#each, which reads the file line by line (rb_io_getline_1
invokation) and leaves the line read to be garbage collected:
static VALUE
rb_io_each_line(int argc, VALUE *argv, VALUE io)
{
// ...
while (!NIL_P(str = rb_io_getline_1(rs, limit, io))) {
rb_yield(str);
}
// ...
}
How to execute .bat file with batch parameters
Just pass them as you would do it normally.
`path\to\.bat -some=flag another-way`
Ruby - iterate tasks with files
You're very close. Dir.foreach()
will return the name of the files whereas File.open()
is going to want the path. A crude example to illustrate this:
directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'
# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end
Related Topics
Why Is the Splat Used Inside an Array Definition Here
How to Change Ruby to Version 1.9.3 (Again) with Rvm
Unable to Delete File from Amazon S3 Using Ruby Script
Shortening Socket Timeout Using Timeout::Timeout(N) Does Not Seem to Work for Me
Undefined Method 'Instance' for Capistrano::Configuration:Class
Ruby Equivalent for Python's For/Else
Ruby Getting the Longest Word of a Sentence
If 'Self' Is Always the Implied Receiver in Ruby, Why Doesn't 'Self.Puts' Work
Fast-Stemmer Installation Problems
Rails 4 Many to Many Association Not Working
Scraping a Site That Requires Login Username and Password on Two Separate Pages
Ruby Regex to Capture Everything Between Two Strings (Inclusive)
Require Ruby File Without .Rb Extension
Should Gemfile.Lock Be Committed to Source Control on Windows