How to read a file from bottom to top in Ruby?
The only correct way to do this that also works on enormous files is to read n bytes at a time from the end until you have the number of lines that you want. This is essentially how Unix tail
works.
An example implementation of IO#tail(n)
, which returns the last n
lines as an Array
:
class IO
TAIL_BUF_LENGTH = 1 << 16
def tail(n)
return [] if n < 1
seek -TAIL_BUF_LENGTH, SEEK_END
buf = ""
while buf.count("\n") <= n
buf = read(TAIL_BUF_LENGTH) + buf
seek 2 * -TAIL_BUF_LENGTH, SEEK_CUR
end
buf.split("\n")[-n..-1]
end
end
The implementation is a little naive, but a quick benchmark shows what a ridiculous difference this simple implementation can already make (tested with a ~25MB file generated with yes > yes.txt
):
user system total real
f.readlines[-200..-1] 7.150000 1.150000 8.300000 ( 8.297671)
f.tail(200) 0.000000 0.000000 0.000000 ( 0.000367)
The benchmark code:
require "benchmark"
FILE = "yes.txt"
Benchmark.bmbm do |b|
b.report "f.readlines[-200..-1]" do
File.open(FILE) do |f|
f.readlines[-200..-1]
end
end
b.report "f.tail(200)" do
File.open(FILE) do |f|
f.tail(200)
end
end
end
Of course, other implementations already exist. I haven't tried any, so I cannot tell you which is best.
Is there an elegant way to parse a text file *backwards*?
There's no software limit to Ruby array. There are some memory limitations though: Array size too big - ruby
Your approach would work much faster if you can read everything into memory, operate there and write it back to disk. Assuming the file fits in memory of course.
Return line number with Ruby readline
Maybe something along these lines:
log_snapshot.each_with_index.reverse_each do |line, n|
case (line)
when /authorization:/
puts '%d: %s' % [ n + 1, line ]
end
end
Where each_with_index
is used to generate 0-indexed line numbers. I've switched to a case
style so you can have more flexibility in matching different conditions. For example, you can add the /i
flag to do a case-insensitive match really easily or add \A
at the beginning to anchor it at the beginning of the string.
Another thing to consider using the block method for File.open
, like this:
File.open(args[:apache_access_log], "r") do |f|
f.readlines.each_with_index.reverse_each do |line, n|
# ...
end
end
Where that eliminates the need for an explicit close
call. The end of the block closes it for you automatically.
Choose starting row for CSV.foreach or similar method? Don't want to load file into memory
I think you have the right idea. Since you've said you're not worried about fields spanning multiple lines, you can seek to a certain line in the file using IO methods and start parsing there. Here's how you might do it:
begin
file = File.open(FILENAME)
# Get the headers from the first line
headers = CSV.parse_line(file.gets)
# Seek in the file until we find a matching line
match = "2,"
while line = file.gets
break if line.start_with?(match)
end
# Rewind the cursor to the beginning of the line
file.seek(-line.size, IO::SEEK_CUR)
csv = CSV.new(file, headers: headers)
# ...do whatever you want...
ensure
# Don't forget the close the file
file.close
end
The result of the above is that csv
will be a CSV object whose first row is the row that starts with 2,
.
I benchmarked this with an 8MB (170k rows) CSV file (from Lahman's Baseball Database) and found that it was much, much faster than using CSV.foreach
alone. For a record in the middle of the file it was about 110x faster, and for a record toward the end about 66x faster. If you want, you can take a look at the benchmark here: https://gist.github.com/jrunning/229f8c2348fee4ba1d88d0dffa58edb7
Obviously 8MB is nothing like 10GB, so regardless this is going to take you a long time. But I'm pretty sure this will be quite a bit faster for you while also accomplishing your goal of not reading all of the data into the file at once.
Retrieve a file in Ruby
File#closed?
returns whether that particular File object is closed, so there is no method that is going to make your current attempted solution work:
f1 = File.new("test.file")
f2 = File.new("test.file")
f1.close
f1.closed? # => true # Even though f2 still has the same file open
It would be best to retain the File object that you're using in order to ask it if it is closed, if possible.
If you really want to know if your current Ruby process has any File objects open for a particular path, something like this feels hack-ish but should mostly work:
def file_is_closed?(file_name)
ObjectSpace.each_object(File) do |f|
if File.absolute_path(f) == File.absolute_path(file_name) && !f.closed?
return false
end
end
true
end
I don't stand by that handling corner cases well, but it seems to work for me in general:
f1 = File.new("test.file")
f2 = File.new("test.file")
file_is_closed?("test.file") # => false
f1.close
file_is_closed?("test.file") # => false
f2.close
file_is_closed?("test.file") # => true
If you want to know if any process has the file open, I think you'll need to resort to something external like lsof
.
How to append a text to file succinctly
Yes. It's poorly documented, but you can use:
File.write('foo.txt', 'some text', mode: 'a+')
Calculate and display the balances from bottom to top in ruby on rails
I still don't understand how balance column [2500, 1500, 2000]
is calculated, but I could argue something from the screenshot.
Basically you are sorting by a column not existing in the model. So, first you need to build that helper column, populate it, then sort by that column.
It should be possible to do it in SQL, but I'm showing in plain Ruby using a Hash as fake database. You can adapt it to your case easily or look for a most efficient way (SQL).
Let's say data are the following:
expenses = [{date: 1, narration: :a, debit: 3.0, credit: 0},
{date: 2, narration: :b, debit: 0.15, credit: 0},
{date: 3, narration: :c, debit: 75.0, credit: 0}]
And the initial balance is:
balance = 1434.64
Now lets loop the data adding the new field balance
and sorting at the end of the loop:
expenses.each do |h|
balance += h[:credit] - h[:debit]
h[:balance] = balance
end.sort!{ |h| h[:balance]}
Now your sorted expenses
are:
[
{:date=>3, :narration=>:a, :debit=>75.0, :credit=>0, :balance=>1356.49}
{:date=>2, :narration=>:b, :debit=>0.15, :credit=>0, :balance=>1431.49}
{:date=>1, :narration=>:c, :debit=>3.0, :credit=>0, :balance=>1431.64}
]
You can do calculation in the controller, then pass expenses
to the view and loop without any need of calculation there.
For your rails app, you could implement as follow.
Add the temporary field balance
to your model (no need to add a column to the database) and initialize to value 0
:
class Expense < ApplicationRecord
attr_accessor :balance
after_initialize :init
def init
self.balance = 0
end
end
Do the calculation in controller, I'm using an initial value of balance
, just to emulate the example:
def index
@expenses = Expense.all
balance = 1434.64
@expenses.each do |e|
balance += e.credit - e.debit
e.balance = balance
end
@expenses = @expenses.sort{ |e| e.balance }
end
Then in your view, just loop:
<% @expenses.each do |expense| %>
<tr>
<td><%= expense.narration %></td>
<td><%= expense.debit %></td>
<td><%= expense.credit %></td>
<td><%= expense.balance %></td>
</tr>
<% end %>
If you insert the records as in your example, you should end up with this result:
# ["c", "0.0", "75.0", "1356.49"]
# ["b", "0.0", "0.15", "1431.49"]
# ["a", "0.0", "3.0", "1431.64"]
Does the Ruby method lookup start from the bottom of a class and go up, or from the top and go down?
later declarations override earlier ones -
class Foo
def hello
'hello first'
end
def hello
'hello second'
end
end
f = Foo.new
puts f.hello # hello second
Related Topics
Differencebetween 'Try' and '&.' (Safe Navigation Operator) in Ruby
Loaderror Running Mongrel with Rails3 and Ruby 1.9.2
How to Pass Data from a Controller to a Model with Ruby on Rails
Net::Ssh Sudo Command Hangs After Entering Password
Traversing a Hash Recursively in Ruby
Functional Code Examples in Ruby
Downloading Existing Paypal Payments with Rest API
Rubygems, Bundler and Rvm Confusion
How to Set Private Instance Variable Used Within a Method Test
Rails - Local Variables Versus Instance Variables
How to Get the Last SQL Query Performed by Activerecord in Ruby on Rails
Sort a Collection of Objects by Number (Highest First) Then by Letter (Alphabetical)
Finding If a Sentence Contains a Specific Phrase in Ruby
Deleting a Modified Object from a Set in a No-Op
How to Send Message Using Gmail API with Ruby Google API Client