Ruby - Read File in Batches

Ruby - Read file in batches

there's no universal way.

1) you can read file by chunks:

File.open('filename','r') do |f|
chunk = f.read(2048)
...
end

disadvantage: you can miss a substring if it'd be between chunks, i.e. you look for "SOME_TEXT", but "SOME_" is a last 5 bytes of 1st 2048-byte chunk, and "TEXT" is a 4 bytes of 2nd chunk

2) you can read file line-by-line

File.open('filename','r') do |f|
line = f.gets
...
end

disadvantage: this way it'd be 2x..5x slower than first method

Read a file in chunks in Ruby

Adapted from the Ruby Cookbook page 204:

FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024

class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end

open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end

Disclaimer: I'm a ruby newbie and haven't tested this.

Import CSV in batches of lines in Rails?

Try AR Import

Old answer

Have you tried to use AR Extensions for bulk import?
You get impressive performance improvements when you are inserting 1000's of rows to DB.
Visit their website for more details.

optimizing reading database and writing to csv file

The problem here is that when you call emails.each ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each:

require 'csv'

BATCH_SIZE = 5000

def write_rows(emails)
CSV.open(file_path, 'w') do |csv|

csv << %w{email name ip created}

emails.find_each do |email|
csv << [email.email, email.name, email.ip, email.created_at]
end
end
end

By default find_each loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size to find_each:

emails.find_each(:batch_size => 5000) do |email|
...

More information about the find_each method (and the related find_in_batches) can be found on the Ruby on Rails Guides.

I've used the CSV class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.

reading large csv files in a rails app takes up a lot of memory - Strategy to reduce memory consumption?

You can make use of CSV.foreach to read just chunks of your CSV file:

 path = Rails.root.join('data/uploads/.../upload.csv') # or, whatever
CSV.foreach(path) do |row|
# process row[i] here
end

If it's run in a background job, you could additionally call GC.start every n rows.


How it works

CSV.foreach operates on an IO stream, as you can see here:

def IO.foreach(path, options = Hash.new, &block)
# ...
open(path, options) do |csv|
csv.each(&block)
end
end

The csv.each part is a call to IO#each, which reads the file line by line (rb_io_getline_1 invokation) and leaves the line read to be garbage collected:

static VALUE
rb_io_each_line(int argc, VALUE *argv, VALUE io)
{
// ...
while (!NIL_P(str = rb_io_getline_1(rs, limit, io))) {
rb_yield(str);
}
// ...
}

How to execute .bat file with batch parameters

Just pass them as you would do it normally.

`path\to\.bat -some=flag another-way`

Ruby - iterate tasks with files

You're very close. Dir.foreach() will return the name of the files whereas File.open() is going to want the path. A crude example to illustrate this:

directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'

# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end


Related Topics



Leave a reply



Submit