Uploading Large File to S3 with Ruby Fails with Out of Memory Error, How to Read and Upload in Chunks?
The v2 AWS SDK for Ruby, aws-sdk
gem, supports streaming objects directly over over the network without loading them into memory. Your example requires only a small correction to do this:
File.open(filepath, 'rb') do |file|
resp = s3.put_object(
:bucket => bucket,
:key => s3key,
:body => file
)
end
This works because it allows the SDK to call #read
on the file object passing in a small number of bytes each time. Calling #read
on a Ruby IO object, such as a file, without a first argument will read the entire object into memory, returning it as a string. This is what has caused your out-of-memory errors.
That said, the aws-sdk
gem provides another, more useful interface for uploading files to Amazon S3. This alternative interface automatically:
- Uses multipart APIs for large objects
- Can use multiple threads to upload parts in parallel, improving upload speed
- Computes MD5s of data client-side to for service-side data integrity checks.
A simple example:
# notice this uses Resource, not Client
s3 = Aws::S3::Resource.new(
:access_key_id => accesskeyid,
:secret_access_key => accesskey,
:region => region
)
s3.bucket(bucket).object(s3key).upload_file(filepath)
This is part of the aws-sdk
resource interfaces. There are quite a few helpful utilities in here. The Client class only provides basic API functionality.
How can I pipe a rack file upload directly to S3?
You're exactly right that you need to turn the request body into a readable stream. Specifically, S3 expects a Ruby IO class (in that it wants a #read method and a #eof? method). Rack request bodies don't have #eof? defined, however, so you have to make a little wrapper class:
class RackS3Wrapper
def initialize(body)
@body = body
@eof = false
end
def read(*args)
ret = @body.read(*args)
if ret == nil or ret == ""
@eof = true
end
ret
end
def eof?
@eof
end
end
Then you can use this wrapper to stream the request to S3 directly:
s3.buckets['com.mydomain.mybucket'].objects['filename'].write(
:data => RackS3Wrapper.new(request.body),
:content_length => request.env['CONTENT_LENGTH'].to_i)
This hasn't been tested in production or anything, but it should work fine.
Read a file in chunks in Ruby
Adapted from the Ruby Cookbook page 204:
FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end
Disclaimer: I'm a ruby newbie and haven't tested this.
reading large csv files in a rails app takes up a lot of memory - Strategy to reduce memory consumption?
You can make use of CSV.foreach
to read just chunks of your CSV file:
path = Rails.root.join('data/uploads/.../upload.csv') # or, whatever
CSV.foreach(path) do |row|
# process row[i] here
end
If it's run in a background job, you could additionally call GC.start
every n rows.
How it works
CSV.foreach
operates on an IO stream, as you can see here:
def IO.foreach(path, options = Hash.new, &block)
# ...
open(path, options) do |csv|
csv.each(&block)
end
end
The csv.each
part is a call to IO#each, which reads the file line by line (rb_io_getline_1
invokation) and leaves the line read to be garbage collected:
static VALUE
rb_io_each_line(int argc, VALUE *argv, VALUE io)
{
// ...
while (!NIL_P(str = rb_io_getline_1(rs, limit, io))) {
rb_yield(str);
}
// ...
}
Related Topics
Rake Test Very Slow in Windows
Unexpected Keyword_End, Expecting $End (Syntaxerror)
Why Is the << Operation on an Array in Ruby Not Atomic
How to Test Whether a String Would Match a Glob in Ruby
Trouble Installing Ruby 1.9.2 with Rvm MAC Os X
Bundle Exec Jekyll Serve: Cannot Load Such File
Ruby 'Gets' That Works Over Multiple Lines
Differencebetween Unicorn and Unicorn_Rails
How to Convert a Ruby Bigdecimal to a 2-Decimal Place String
Is Ruby Any Good for Gui Development
Call Next on Ruby Loop from External Method
Capybara Trouble Filling in Js Modal
Marshal Ruby Hash with Default Proc - Remove the Default Proc
Tinytds Error: Adaptive Server Connection Timed Out
Word Document.Saveas Ignores Encoding, When Calling Through Ole, from Ruby or Vbs