File.Open with Block VS Without

File.open with block vs without

DarkDust already said that these methods are different. I'll explain you the blocks a little more, as I suppose that I can guess why you asked this question ;-)

The block in ruby is just a parameter for some method. It's not just a different syntax.

Methods which accept (optional) blocks usually have a condition to test whether they have been called with block, or without.

Consider this very simplified example: (the real File.open is similar, but it ensures the file is closed even if your block raises an error, for example)

def open(fname)
self.do_open(fname)
if block_given?
yield(self) # This will 'run' the block with given parameter
self.close
else
return self # This will just return some value
end
end

In general, every method may work (works) differently with a block or without a block. It should be always stated in the method documentation.

What's the difference between File.open() with and without a block?

With:

File.open( "some_file.txt" ) do |file|
puts file.read
end

The open method calls your block with a File instance and cleans up the file reference once your block returns, closing and flushing the file as needed, so your application does not leak file handlers, which is awesome as we usually forget to close files or do not take into account that file handling might yield exceptions.

When you do it like this:

file = File.open("some_file.txt")
puts file.read

The open method gives you the File instance but now you are responsible for cleaning up the mess and closing the file when you don't need it anymore. So, if you're doing it like this, you should possibly set the file into a begin/rescue block and add an ensure clause closing the file if anything goes wrong.

Unless you have a very specific need, you should never use this second version, the first version is simpler and safer and you don't have to care about closing/cleaning up whatever you did to the file.

Difference with open-uri with block and without it

The documentation for OpenURI is a little opaque to beginners, but the docs for #open can be found here.

Those docs say:

#open returns an IO-like object if block is not given. Otherwise it yields the IO object and return the value of the block.

The key words here are "IO-like object." We can infer from that that the object (in your examples, file), will respond to the #close method.

While the documentation doesn't say so, by looking at the source we can see that #open will return either a StringIO or a Tempfile object, depending on the size of the data returned. OpenURI's internal Buffer class first initializes a StringIO object, but if the size of the output exceeds 10,240 bytes it creates a Tempfile and writes the data to it (to avoid storing large amounts of data in memory). Both StringIO and Tempfile have behavior consistent with IO, so it's good practice (when not passing a block to #open), to call #close on the object in an ensure:

begin
file = open(url)
# ...do some work...
ensure
file.close
end

Code in the ensure section always runs, even if code between begin and ensure raises an exception, so this will, well, ensure that file.close gets called even if an error occurs.

If you're opening a file using the 'with' statement, do you still need to close the file object?

The answer to your immediate question is "No". The with block ensures that the file will be closed when control leaves the block, for whatever reason that happens, including exceptions (well, excluding someone yanking the power cord to your computer and some other rare events).

So it's good practice to use a with block.

Now arguably, having opened a file only for reading and then failing to close it is not that much of a problem. When garbage collection comes around (whenever that may be), that file will be closed, too, if there are no references to it anymore; at the latest that will happen when your program exits. In fact, several code samples in the official docs neglect closing a file that has been opened only for read access. When writing a file or when using the "read plus" mode like in your example, you definitely need to close the file. There are many questions her on SO dealing with incomplete/corrupted files because of a failure to close them properly.

Why does open(file, w) not block?

Well, the implementation of your open() function must be passing FILE_SHARED_WRITE to the kernel. Otherwise, one of the calls would return an error, not block.

Ruby's File.open and the need for f.close

I saw many times in ruby codes unmatched File.open calls

Can you give an example? I only ever see that in code written by newbies who lack the "common knowledge in most programming languages that the flow for working with files is open-use-close".

Experienced Rubyists either explicitly close their files, or, more idiomatically, use the block form of File.open, which automatically closes the file for you. Its implementation basically looks something like like this:

def File.open(*args, &block)
return open_with_block(*args, &block) if block_given?
open_without_block(*args)
end

def File.open_without_block(*args)
# do whatever ...
end

def File.open_with_block(*args)
yield f = open_without_block(*args)
ensure
f.close
end

Scripts are a special case. Scripts generally run so short, and use so few file descriptors that it simply doesn't make sense to close them, since the operating system will close them anyway when the script exits.

Do we need to explicitly close?

Yes.

If yes then why does the GC autoclose?

Because after it has collected the object, there is no way for you to close the file anymore, and thus you would leak file descriptors.

Note that it's not the garbage collector that closes the files. The garbage collector simply executes any finalizers for an object before it collects it. It just so happens that the File class defines a finalizer which closes the file.

If not then why the option?

Because wasted memory is cheap, but wasted file descriptors aren't. Therefore, it doesn't make sense to tie the lifetime of a file descriptor to the lifetime of some chunk of memory.

You simply cannot predict when the garbage collector will run. You cannot even predict if it will run at all: if you never run out of memory, the garbage collector will never run, therefore the finalizer will never run, therefore the file will never be closed.

File.open and blocks in Ruby 1.8.7

In order to close file after block, you should pass block to File.open() directly, not to each:

File.open('somefile.txt', 'r') do |f| 
f.each_line { |l| puts l }
end

File.open(…).each {…} is just iterating over opened file without closing it.

File read using open() vs with open()

Using with statement is not for performance gain, I do not think there are any performance gains or loss associated with using with statement, as long as, you perform the same cleanup activity that using with statement would perform automatically.

When you use with statement with open function, you do not need to close the file at the end, because with would automatically close it for you.

Also, with statement is not just for openning files, with is used in conjuction with context managers. Basically, if you have an object that you want to make sure it is cleaned once you are done with it or some kind of errors occur, you can define it as a context manager and with statement will call its __enter__() and __exit__() methods on entry to and exit from the with block. According to PEP 0343 -

This PEP adds a new statement "with" to the Python language to make it possible to factor out standard uses of try/finally statements.

In this PEP, context managers provide __enter__() and __exit__() methods that are invoked on entry to and exit from the body of the with statement.

Also, performance testing of using with and not using it -

In [14]: def foo():
....: f = open('a.txt','r')
....: for l in f:
....: pass
....: f.close()
....:

In [15]: def foo1():
....: with open('a.txt','r') as f:
....: for l in f:
....: pass
....:

In [17]: %timeit foo()
The slowest run took 41.91 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 186 µs per loop

In [18]: %timeit foo1()
The slowest run took 206.14 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 179 µs per loop

In [19]: %timeit foo()
The slowest run took 202.51 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 180 µs per loop

In [20]: %timeit foo1()
10000 loops, best of 3: 193 µs per loop

In [21]: %timeit foo1()
10000 loops, best of 3: 194 µs per loop


Related Topics



Leave a reply



Submit