How to Read a File from Bottom to Top in Ruby

How to read a file from bottom to top in Ruby?

The only correct way to do this that also works on enormous files is to read n bytes at a time from the end until you have the number of lines that you want. This is essentially how Unix tail works.

An example implementation of IO#tail(n), which returns the last n lines as an Array:

class IO
  TAIL_BUF_LENGTH = 1 << 16

  def tail(n)
    return [] if n < 1

    seek -TAIL_BUF_LENGTH, SEEK_END

    buf = ""
    while buf.count("\n") <= n
      buf = read(TAIL_BUF_LENGTH) + buf
      seek 2 * -TAIL_BUF_LENGTH, SEEK_CUR
    end

    buf.split("\n")[-n..-1]
  end
end

The implementation is a little naive, but a quick benchmark shows what a ridiculous difference this simple implementation can already make (tested with a ~25MB file generated with yes > yes.txt):

                            user     system      total        real
f.readlines[-200..-1]   7.150000   1.150000   8.300000 (  8.297671)
f.tail(200)             0.000000   0.000000   0.000000 (  0.000367)

The benchmark code:

require "benchmark"

FILE = "yes.txt"

Benchmark.bmbm do |b|
  b.report "f.readlines[-200..-1]" do
    File.open(FILE) do |f|
      f.readlines[-200..-1]
    end
  end

  b.report "f.tail(200)" do
    File.open(FILE) do |f|
      f.tail(200)
    end
  end
end

Of course, other implementations already exist. I haven't tried any, so I cannot tell you which is best.

Is there an elegant way to parse a text file backwards?

There's no software limit to Ruby array. There are some memory limitations though: Array size too big - ruby

Your approach would work much faster if you can read everything into memory, operate there and write it back to disk. Assuming the file fits in memory of course.

Return line number with Ruby readline

Maybe something along these lines:

log_snapshot.each_with_index.reverse_each do |line, n|
  case (line)
  when /authorization:/
    puts '%d: %s' % [ n + 1, line ]
  end
end

Where each_with_index is used to generate 0-indexed line numbers. I've switched to a case style so you can have more flexibility in matching different conditions. For example, you can add the /i flag to do a case-insensitive match really easily or add \A at the beginning to anchor it at the beginning of the string.

Another thing to consider using the block method for File.open, like this:

File.open(args[:apache_access_log], "r") do |f|
  f.readlines.each_with_index.reverse_each do |line, n|
    # ...
  end
end

Where that eliminates the need for an explicit close call. The end of the block closes it for you automatically.

Choose starting row for CSV.foreach or similar method? Don't want to load file into memory

I think you have the right idea. Since you've said you're not worried about fields spanning multiple lines, you can seek to a certain line in the file using IO methods and start parsing there. Here's how you might do it:

begin
  file = File.open(FILENAME)

  # Get the headers from the first line
  headers = CSV.parse_line(file.gets)

  # Seek in the file until we find a matching line
  match = "2,"
  while line = file.gets
    break if line.start_with?(match)
  end

  # Rewind the cursor to the beginning of the line
  file.seek(-line.size, IO::SEEK_CUR)

  csv = CSV.new(file, headers: headers)

  # ...do whatever you want...
ensure
  # Don't forget the close the file
  file.close
end

The result of the above is that csv will be a CSV object whose first row is the row that starts with 2,.

I benchmarked this with an 8MB (170k rows) CSV file (from Lahman's Baseball Database) and found that it was much, much faster than using CSV.foreach alone. For a record in the middle of the file it was about 110x faster, and for a record toward the end about 66x faster. If you want, you can take a look at the benchmark here: https://gist.github.com/jrunning/229f8c2348fee4ba1d88d0dffa58edb7

Obviously 8MB is nothing like 10GB, so regardless this is going to take you a long time. But I'm pretty sure this will be quite a bit faster for you while also accomplishing your goal of not reading all of the data into the file at once.

Retrieve a file in Ruby

File#closed? returns whether that particular File object is closed, so there is no method that is going to make your current attempted solution work:

f1 = File.new("test.file")
f2 = File.new("test.file")
f1.close
f1.closed? # => true # Even though f2 still has the same file open

It would be best to retain the File object that you're using in order to ask it if it is closed, if possible.

If you really want to know if your current Ruby process has any File objects open for a particular path, something like this feels hack-ish but should mostly work:

def file_is_closed?(file_name)
  ObjectSpace.each_object(File) do |f|
    if File.absolute_path(f) == File.absolute_path(file_name) && !f.closed?
      return false
    end
  end

  true
end

I don't stand by that handling corner cases well, but it seems to work for me in general:

f1 = File.new("test.file")
f2 = File.new("test.file")
file_is_closed?("test.file") # => false
f1.close
file_is_closed?("test.file") # => false
f2.close
file_is_closed?("test.file") # => true

If you want to know if any process has the file open, I think you'll need to resort to something external like lsof.

How to append a text to file succinctly

Yes. It's poorly documented, but you can use:

File.write('foo.txt', 'some text', mode: 'a+')

Calculate and display the balances from bottom to top in ruby on rails

I still don't understand how balance column [2500, 1500, 2000] is calculated, but I could argue something from the screenshot.

Basically you are sorting by a column not existing in the model. So, first you need to build that helper column, populate it, then sort by that column.
It should be possible to do it in SQL, but I'm showing in plain Ruby using a Hash as fake database. You can adapt it to your case easily or look for a most efficient way (SQL).

Let's say data are the following:

expenses = [{date: 1, narration: :a, debit: 3.0, credit: 0},
            {date: 2, narration: :b, debit: 0.15, credit: 0},
            {date: 3, narration: :c, debit: 75.0, credit: 0}]

And the initial balance is:

balance = 1434.64

Now lets loop the data adding the new field balance and sorting at the end of the loop:

expenses.each do |h|
  balance += h[:credit] - h[:debit]
  h[:balance] = balance
end.sort!{ |h| h[:balance]}

Now your sorted expenses are:

[
  {:date=>3, :narration=>:a, :debit=>75.0, :credit=>0, :balance=>1356.49}
  {:date=>2, :narration=>:b, :debit=>0.15, :credit=>0, :balance=>1431.49}
  {:date=>1, :narration=>:c, :debit=>3.0, :credit=>0, :balance=>1431.64}
]

You can do calculation in the controller, then pass expenses to the view and loop without any need of calculation there.

For your rails app, you could implement as follow.

Add the temporary field balance to your model (no need to add a column to the database) and initialize to value 0:

class Expense < ApplicationRecord
  attr_accessor :balance

  after_initialize :init

  def init
    self.balance = 0
  end
end

Do the calculation in controller, I'm using an initial value of balance, just to emulate the example:

def index
  @expenses = Expense.all

  balance = 1434.64
  @expenses.each do |e|
    balance += e.credit - e.debit
    e.balance = balance
  end

  @expenses = @expenses.sort{ |e| e.balance }    
end

Then in your view, just loop:

<% @expenses.each do |expense| %>
  <tr>
    <td><%= expense.narration %></td>
    <td><%= expense.debit %></td>
    <td><%= expense.credit %></td>
    <td><%= expense.balance %></td>
  </tr>
<% end %>

If you insert the records as in your example, you should end up with this result:

# ["c", "0.0", "75.0", "1356.49"]
# ["b", "0.0", "0.15", "1431.49"]
# ["a", "0.0", "3.0", "1431.64"]

Does the Ruby method lookup start from the bottom of a class and go up, or from the top and go down?

later declarations override earlier ones -

class Foo
  def hello
    'hello first'
  end

  def hello
    'hello second'
  end
end

f = Foo.new

puts f.hello # hello second

How to Read a File from Bottom to Top in Ruby