Ruby: Start Reading at Arbitrary Point in Large File

Ruby: start reading at arbitrary point in large file

For lines, it might be a bit difficult, but you can seek within a file to a certain byte.

IO#seek (link) and IO#pos (link) will both allow you to seek to a given byte within the file.

What's the fastest way to read a large file in Ruby?

Ruby will likely be using the same or very similar low-level code (written in C) to do the actual reading from disk for the first three options, so they should perform similarly. Given that, you should choose whichever is most convenient for you; the ability to do that is what makes languages like Ruby so useful! You will be reading a lot of data from disk, so I would suggest using each_line and processing each line as you read it.

I would not recommend bringing grep, sed, or any other such external utilities into the picture unless you have a very good reason, as they will make your code less portable and expose you to failures that may be difficult to diagnose.

How to read whole file in Ruby?

IO.read("filename")

or

File.read("filename")

How to efficiently parse large text files in Ruby

I just did a test on a 600,000 line file and it iterated over the file in less than half a second. I'm guessing the slowness is not in the file looping but the line parsing. Can you paste your parse code also?

Easier way to search through large files in Ruby?

  • fgrep as a standalone or called from system('fgrep ...') may be faster solution
  • file.readlines might be better in speed, but it's a time-space tradeoff
  • look at this little research - last approaches seem to be rather fast.

Best way to work with large amounts of CSV data quickly

how about using a database.

jam the records into tables, and then query them out using joins.

the import might take awhile, but the DB engine will be optimized for the join and retrieval part...

Size Reading and Writing to File and conditional statements

To obtain the number of lines in the file you need to read it. If possible, you want to avoid reading the file more than once.

Assuming the file is not excessively large, you could gulp it into an array, using IO::readlines:

 arr = File.readlines("book.txt")     
puts case arr.size
when 0
"There are no people in the book."
when 1
"There is one person in the book"
when 2
"There are two people in the book"
else
"There are some entries in the book"
end
puts arr

Alternatively, you could gulp it into a string, using IO::read:

 str = File.read("book.txt")     
puts case str.count("\n")
when 0
"There are no people in the book."
when 1
"There is one person in the book"
when 2
"There are two people in the book"
else
"There are some entries in the book"
end
puts str

If the file is so large that you need to read it one line at a time you can use IO::foreach. This does require two passes through the file, however.

 puts case File.foreach("book.txt").count
when 0
"There are no people in the book."
when 1
"There is one person in the book"
when 2
"There are two people in the book"
else
"There are some entries in the book"
end
File.foreach("book.txt") { |line| puts line }

What are all the common ways to read a file in Ruby?

File.open("my/file/path", "r") do |f|
f.each_line do |line|
puts line
end
end
# File is closed automatically at end of block

It is also possible to explicitly close file after as above (pass a block to open closes it for you):

f = File.open("my/file/path", "r")
f.each_line do |line|
puts line
end
f.close


Related Topics



Leave a reply



Submit