Fastest Way to Skip Lines While Parsing Files in Ruby

Fastest way to skip lines while parsing files in Ruby?

file.lines.drop(500).take(100) # will get you lines 501-600

Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.

PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:

file.lines.select.with_index{|l,i| (501..600) === i}

PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:

enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100

or, if you prefer FP:

file.lines.tap{|enum| 500.times{enum.next}}.take(100)

Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)

Ruby - How to skip/ignore specific lines when reading a file?

You could do it like this:

a = ["#","Feature","In order","As a","I want"]   
File.open(file).each_line do |line|
line.chomp!
next if line.empty? || a.any? { |a| line =~ /#{a}/ }
end

Skipping the first line when reading in a file in 1.9.3

Change each to each_with_index do |line, index| and next if index == 0 will work.

Ruby CSV: How to skip the first two lines of file?

I didn't benchmark, but try this:

CSV.to_enum(:foreach, filename, col_sep: "\t").drop(2).each do |row|

Skip first 5 lines of CSV

You should be able to bypass the CSV module by constructing a valid CSV string from your otherwise incompatible data:

CSV.parse(File.readlines(path).drop(5).join) do |row|
# ...
end

How to skip the first line of a CSV file and make the second line the header

I don't think there's an elegant way of doing it, but it can be done:

require "csv"

# Create a stream using the original file.
# Don't use `textmode` since it generates a problem when using this approach.
file = File.open "file.csv"

# Consume the first CSV row.
# `\r` is my row separator character. Verify your file to see if it's the same one.
loop { break if file.readchar == "\r" }

# Create your CSV object using the remainder of the stream.
csv = CSV.new file, headers: true

Using ruby to find a word or phrase in a text file capture the word skip a line and then read the line until a blank (repeat)

Here's mine:

data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
lines.each_line do |line|
puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
end
end

That regex is ugly, but still better than looking at 30 lines.
(?:(?!\n\n).)* means match any char that is not followed by 2 newlines. the (?:) is so it doesn't also capture the '.'



Related Topics



Leave a reply



Submit