Is It Necessary to Close Stringio in Ruby

Is it necessary to close StringIO in ruby?

  • StringIO#close does not free any resources or drop its reference to the accumulated string. Therefore calling it has no effect upon resource usage.

  • Only StringIO#finalize, called during garbage collection, frees the reference to the accumulated string so that it can be freed (provided the caller does not retain its own reference to it).

  • StringIO.open, which briefly creates a StringIO instances, does not keep a reference to that instance after it returns; therefore that StringIO's reference to the accumulated string can be freed (provided the caller does not retain its own reference to it).

  • In practical terms, there is seldom a need to worry about a memory leak when using StringIO. Just don't hang on to references to StringIO once you're done with them and all will be well.


Diving into the source

The only resource used by a StringIO instance is the string it is accumulating. You can see that in stringio.c (MRI 1.9.3); here we see the structure that holds a StringIO's state:

static struct StringIO *struct StringIO {
VALUE string;
long pos;
long lineno;
int flags;
int count;
};

When a StringIO instance is finalized (that is, garbage collected), its reference to the string is dropped so that the string may be garbage collected if there are no other references to it. Here's the finalize method, which is also called by StringIO#open(&block) in order to close the instance.

static VALUE
strio_finalize(VALUE self)
{
struct StringIO *ptr = StringIO(self);
ptr->string = Qnil;
ptr->flags &= ~FMODE_READWRITE;
return self;
}

The finalize method is called only when the object is garbage collected. There is no other method of StringIO which frees the string reference.

StringIO#close just sets a flag. It does not free the reference to the accumulated string or in any other way affect resource usage:

static VALUE
strio_close(VALUE self)
{
struct StringIO *ptr = StringIO(self);
if (CLOSED(ptr)) {
rb_raise(rb_eIOError, "closed stream");
}
ptr->flags &= ~FMODE_READWRITE;
return Qnil;
}

And lastly, when you call StringIO#string, you get a reference to the exact same string that the StringIO instance has been accumulating:

static VALUE
strio_get_string(VALUE self)
{
return StringIO(self)->string;
}

How to leak memory when using StringIO

All of this means that there is only one way for a StringIO instance to cause a resource leak: You must not close the StringIO object, and you must keep it around longer than you keep the string you got when you called StringIO#string. For example, imagine a class having a StringIO object as an instance variable:

class Leaker

def initialize
@sio = StringIO.new
@sio.puts "Here's a large file:"
@sio.puts
@sio.write File.read('/path/to/a/very/big/file')
end

def result
@sio.string
end

end

Imagine that the user of this class gets the result, uses it briefly, and then discards it, and yet keeps a reference to the instance of Leaker. You can see that the Leaker instance retains a reference to the result via the un-closed StringIO instance. This could be a problem if the file is very large, or if there are many extant instance of Leaker. This simple (and deliberately pathological) example can be fixed by simply not keeping the StringIO as an instance variable. When you can (and you almost always can), it's better to simply throw away the StringIO object than to go through the bother of closing it explicitly:

class NotALeaker

attr_reader :result

def initialize
sio = StringIO.new
sio.puts "Here's a large file:"
sio.puts
sio.write File.read('/path/to/a/very/big/file')
@result = sio.string
end

end

Add to all of this that these leaks only matter when the strings are large or the StringIO instances numerous and the StringIO instance is long lived, and you can see that explicitly closing StringIO is seldom, if ever, needed.

Should I close StringIO instances explicitly?

Well, reads and writes go straight to the underlying string; there's no extra buffers to flush, and no OS-level resources to return.

The only reason you might want to close the StringIO is to make subsequent IOs fail or if you needed to make closed? return true, which could be useful if you gave that StringIO to some other component. On the other hand, if you're just going to discard the StringIO a moment later, it doesn't matter in the slightest; the garbage collector doesn't care if it's marked as open or closed.

How can I clear a `StringIO` instance?

seek or rewind only affect next read/write operations, not the content of the internal storage.

You can use StringIO#truncate like File#truncate:

require 'stringio'
io = StringIO.new
io.write("foo")
io.string
# => "foo"
io.truncate(0) # <---------
io.string
# => ""

Alternative:

You can also use StringIO#reopen (NOTE: File does not have reopen method):

io.reopen("")
io.string
# => ""

What are the advantages to using StringIO in Ruby as opposed to String?

Basically, it makes a string look like an IO object, hence the name StringIO.

The StringIO class has read and write methods, so it can be passed to parts of your code that were designed to read and write from files or sockets. It's nice if you have a string and you want it to look like a file for the purposes of testing your file code.

def foo_writer(file)
file.write "foo"
end

def test_foo_writer
s = StringIO.new
foo_writer(s)
raise 'fail' unless s.string == 'foo'
end

Ruby's File.open and the need for f.close

I saw many times in ruby codes unmatched File.open calls

Can you give an example? I only ever see that in code written by newbies who lack the "common knowledge in most programming languages that the flow for working with files is open-use-close".

Experienced Rubyists either explicitly close their files, or, more idiomatically, use the block form of File.open, which automatically closes the file for you. Its implementation basically looks something like like this:

def File.open(*args, &block)
return open_with_block(*args, &block) if block_given?
open_without_block(*args)
end

def File.open_without_block(*args)
# do whatever ...
end

def File.open_with_block(*args)
yield f = open_without_block(*args)
ensure
f.close
end

Scripts are a special case. Scripts generally run so short, and use so few file descriptors that it simply doesn't make sense to close them, since the operating system will close them anyway when the script exits.

Do we need to explicitly close?

Yes.

If yes then why does the GC autoclose?

Because after it has collected the object, there is no way for you to close the file anymore, and thus you would leak file descriptors.

Note that it's not the garbage collector that closes the files. The garbage collector simply executes any finalizers for an object before it collects it. It just so happens that the File class defines a finalizer which closes the file.

If not then why the option?

Because wasted memory is cheap, but wasted file descriptors aren't. Therefore, it doesn't make sense to tie the lifetime of a file descriptor to the lifetime of some chunk of memory.

You simply cannot predict when the garbage collector will run. You cannot even predict if it will run at all: if you never run out of memory, the garbage collector will never run, therefore the finalizer will never run, therefore the file will never be closed.

Ruby StringIO for concurrent reading and writing

You should consider using a Queue. If you do not need thread safety, then a simple array might be fine too.

Is there such a thing as opening a StringIO for writing?

Here's your problem:

# frozen_string_literal: true

Your string contents is frozen and can't be modified. StringIO expresses it with the abovementioned IOError.

Why ruby StringIO does not give different encodings

Lets dissect your code...

a.read(2)

This reads two bytes from the stream and returns a String. As you are reading a specific number of bytes, Ruby can't guarantee any character boundaries. Because of this, it specified that the returned string will by binary encoded, i.e. Encoding:ASCII-8BIT.

In your next line, you are using

a.read

You are thus reading until the end of the stream and return all remaining data. The encoding of the returned string can either be given as an argument to the read method or default to your defined external encoding (in your case UTF-8).

Now, as you have read to the end of the stream, any subsequent reads will either result in an error or simply return an empty string. In the case of StringIO, this happens to be binary string. Although I didn't find any documentation about this specific case, it's clearly defined in MRI's code of the StringIO class.

a.read

will thus return an empty string in binary encoding.



Related Topics



Leave a reply



Submit