Reading the Last N Lines of a File in Ruby

How do I read the nth line of a file efficiently in Ruby?

What about IO.foreach?

IO.foreach('filename') { |line| p line; break }

That should read the first line, print it, and then stop. It does not read the entire file; it reads one line at a time.

Ruby - how to read first n lines from file into array

Here is a one-line solution:

lines = File.foreach('file.txt').first(10)

I was worried that it might not close the file in a prompt manner (it might only close the file after the garbage collector deletes the Enumerator returned by File.foreach). However, I used strace and I found out that if you call File.foreach without a block, it returns an enumerator, and each time you call the first method on that enumerator it will open up the file, read as much as it needs, and then close the file. That's nice, because it means you can use the line of code above and Ruby will not keep the file open any longer than it needs to.

What's the best way to remove the last n lines of a file in a ruby script?

This is about the simplest way I can think to do this in pure ruby that also works for large files, since it processes each line at a time instead of reading the whole file into memory:

INFILE = "input.txt"
OUTFILE = "output.txt"

total_lines = File.foreach(INFILE).inject(0) { |c, _| c+1 }
desired_lines = total_lines - 4

# open output file for writing
File.open(OUTFILE, 'w') do |outfile|
# open input file for reading
File.foreach(INFILE).with_index do |line, index|
# stop after reaching the desired line number
break if index == desired_lines

# copy lines from infile to outfile
outfile << line
end
end

However, this is about twice as slow as what you posted on a 160mb file I created. You can shave off about a third by using wc to get the total lines, and using pure Ruby for the rest:

total_lines = `wc -l < #{INFILE}`.strip.to_i
# rest of the Ruby File code

Another caveat is that your CSV must not have it's own line breaks within any cell content, in which case, you would need a CSV parser, and CSV.foreach(INFILE) do |row| could be used instead, but it is quite a bit slower in my limited testing, but you mentioned above that your cells should be ok to be processes by file line.

That said, what you posted using wc and dd is much faster, so maybe you should keep using that.

Determine last line in Ruby

If you're iterating the file with each, then the last line will be passed to the block after the end-of-file is reached, because the last line is, by definition, the line ending with EOF.

So just call file.eof? in the block.

If you'd like to determine if it's the last non-empty line in the file, you'd have to implement some kind of readahead.

Ruby Count lines in file including last line(empty)

Interesting question (although your example file is cumbersome). Your editor shows a 21st line because the 20th line ends with a newline character. Without a trailing newline character, your editor would show 20 lines.

Here's a simpler example:

a = "foo\nbar"
b = "baz\nqux\n"

A text editor would show:

# file a
1 foo
2 bar

# file b
1 baz
2 qux
3

Ruby however sees 2 lines in either cases:

a.lines       #=> ["foo\n", "bar"]
a.lines.count #=> 2

b.lines #=> ["baz\n", "qux\n"]
b.lines.count #=> 2

You could trick Ruby into recognizing the trailing newline by adding an arbitrary character:

(a + '_').lines       #=> ["foo\n", "bar_"]
(a + '_').lines.count #=> 2

(b + '_').lines #=> ["baz\n", "qux\n", "_"]
(b + '_').lines.count #=> 3

Or you could use a Regexp that matches either end of line ($) or end of string (\Z):

a.scan(/$|\Z/)       #=> ["", ""]
a.scan(/$|\Z/).count #=> 2

b.scan(/$|\Z/) #=> ["", "", ""]
b.scan(/$|\Z/).count #=> 3

How to read lines of a file in Ruby

I believe my answer covers your new concerns about handling any type of line endings since both "\r\n" and "\r" are converted to Linux standard "\n" before parsing the lines.

To support the "\r" EOL character along with the regular "\n", and "\r\n" from Windows, here's what I would do:

line_num=0
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end

Of course this could be a bad idea on very large files since it means loading the whole file into memory.

Download Last 20k lines of file in Rails 5

Is there a way to do this without reading the last n lines...

Yes, you don't need to save the last n lines to a file to serve it. You might want to, for performance reasons, but you don't have to.

The trick is: how do you determine what are the last n lines, without reading the whole file? This is not possible, not without some scanning. If lines were fixed size, that'd be another story. But as it is, the easiest way (in my opinion) would be to open file and read only the last 64kb of data (or whatever amount you need).

There's a very good change that this chunk of data will start in the middle of some log line. So you just discard head of this data block until (and including) first newline. What will remain is whole log lines. Not exactly 20k, but similar (if you calculate your average log line length and use it to calculate how big a block to read).

Then you can use rails' send_data to send that block as if it were a real file.



Related Topics



Leave a reply



Submit