What Are All the Common Ways to Read a File in Ruby

What are all the common ways to read a file in Ruby?

File.open("my/file/path", "r") do |f|
  f.each_line do |line|
    puts line
  end
end
# File is closed automatically at end of block

It is also possible to explicitly close file after as above (pass a block to open closes it for you):

f = File.open("my/file/path", "r")
f.each_line do |line|
  puts line
end
f.close

How to read lines of a file in Ruby

I believe my answer covers your new concerns about handling any type of line endings since both "\r\n" and "\r" are converted to Linux standard "\n" before parsing the lines.

To support the "\r" EOL character along with the regular "\n", and "\r\n" from Windows, here's what I would do:

line_num=0
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
  print "#{line_num += 1} #{line}"
end

Of course this could be a bad idea on very large files since it means loading the whole file into memory.

How to open and read a file in one line in ruby

File.read("/path/to/file")

It will read whole file content and return it as a result.

file read in ruby getting output as spaces in character

If you binread, essentially you have UTF-8 characters in between

irb(main):013:0> f = File.binread('f2.txt')
=> "1,S1\xC2\xAD-88,S2\xC2\xAD-53,S3\xC2\xAD-69,S4\xC2\xAD-64"

\xC2\xAD are essentially whitespace characters

This may be because you have copied it from somewhere incorrectly or it was introduced in your text because of God. Don't know. You an check here, it shows there are hidden characters in between your text.

This will remove all characters not wanted.

File.foreach('f2.txt') do |f|
 puts f.gsub(/[^\\s!-~]/, '')
end

=> 1,S1-88,S2-53,S3-69,S4-64

Read file and write file

If you want to make your code work, change:

aFile = File.new("mydata.txt", "w")

to:

aFile = File.new("mydata.txt", "r+")

You can change:

 count = aFile.gets
 if (is_numeric(count))

to:

  count = aFile.gets.to_i
 if (count.is_a?(Fixnum))

and then get rid of the is_numeric?(obj) method.

Also you're not incrementing the counter, you can fix that as well.

Read any file with extension .txt inside a specified folder in ruby every 2 minutes

You could do something like:

Dir.foreach('/path-to-your-files') do |item| 
  next unless File.extname(item) == '.txt'
  next if File.directory? item
  file = File.read(item)
  # do what you want with the file
  File.delete(item) if File.exist?(item)
end

As @engineersmnky says, you can simplify this a little using Dir.glob, as in:

Dir.glob('/path/to/files/*.txt') do
  # the rest, without the code to skip irrelevant files
end

That should do what you're after. Let me know how you get on!

Edit - to do this every two minutes, you'll have to use a reoccurring job. Sorry, slipped my mind when writing the question and I've got to go for now. Hopefully someone else'll cover this while I'm away, and I'll delete / update when I'm back online!

And to update upon this - the best way to get this running every two minutes will be to use a cron task, and the whatever gem makes this a breeze. After setting it up, you can use, for example:

every 2.minutes do
  # run the code above
end

What is the most performant way of processing this large text file?

You need to run a benchmark test, using Ruby's built-in Benchmark to figure out what is your fastest choice.

However, from experience, I've found that "slurping" the file, i.e., reading it all in at once, is not any faster than using a loop with IO.foreach or File.foreach. This is because Ruby and the underlying OS do file buffering as the reads occur, allowing your loop to occur from memory, not directly from disk. foreach will not strip the line-terminators for you, like split would, so you'll need to add a chomp or chomp! if you want to mutate the line read in:

File.foreach('/path/to/file') do |li|
  puts li.chomp
end

File.foreach('/path/to/file') do |li|
  li.chomp!
  puts li
end

Also, slurping has the problem of not being scalable; You could end up trying to read a file bigger than memory, taking your machine to its knees, while reading line-by-line will never do that.

Here's some performance numbers:

#!/usr/bin/env ruby

require 'benchmark'
require 'fileutils'

FILENAME = 'test.txt'
LOOPS = 1

puts "Ruby Version: #{RUBY_VERSION}"
puts "Filesize being read: #{File.size(FILENAME)}"
puts "Lines in file: #{`wc -l #{FILENAME}`.split.first}"

Benchmark.bm(20) do |x|
  x.report('read.split')           { LOOPS.times { File.read(FILENAME).split("\n") }}
  x.report('read.lines.chomp')     { LOOPS.times { File.read(FILENAME).lines.map(&:chomp) }}
  x.report('readlines.map.chomp1') { LOOPS.times { File.readlines(FILENAME).map(&:chomp) }}
  x.report('readlines.map.chomp2') { LOOPS.times { File.readlines(FILENAME).map{ |s| s.chomp } }}
  x.report('foreach.map.chomp1')   { LOOPS.times { File.foreach(FILENAME).map(&:chomp) }}
  x.report('foreach.map.chomp2')   { LOOPS.times { File.foreach(FILENAME).map{ |s| s.chomp } }}
end

And the results:

Ruby Version: 1.9.3
Filesize being read: 42026131
Lines in file: 465440
                           user     system      total        real
read.split             0.150000   0.060000   0.210000 (  0.213365)
read.lines.chomp       0.470000   0.070000   0.540000 (  0.541266)
readlines.map.chomp1   0.450000   0.090000   0.540000 (  0.535465)
readlines.map.chomp2   0.550000   0.060000   0.610000 (  0.616674)
foreach.map.chomp1     0.580000   0.060000   0.640000 (  0.641563)
foreach.map.chomp2     0.620000   0.050000   0.670000 (  0.662912)

On today's machines a 42MB file can be read into RAM pretty safely. I have seen files a lot bigger than that which won't fit into the memory of some of our production hosts. While foreach is slower, it's also not going to take a machine to its knees by sucking up all memory if there isn't enough memory.

On Ruby 1.9.3, using the map(&:chomp) method, instead of the older form of map { |s| s.chomp }, is a lot faster. That wasn't true with older versions of Ruby, so caveat emptor.

Also, note that all the above processed the data in less than one second on my several years old Mac Pro. All in all I'd say that worrying about the load speed is premature optimization, and the real problem will be what is done after the data is loaded.

Reading a text file in Ruby gives wrong output

What happened is that your file had two "lines" separated by a carraige return character, and not a linefeed.

You showed the bytes in your file as

116 114 105 109 40 48 44 32 49 53 52 52 55 41 13 48 44 32 49 53 52 52 55

That 13 is a carriage return, which is sometimes "displayed" by the writer going back to the start of the line it is writing.

So first it wrote out

trim(0, 15447)

then it went back to the start of the same line and wrote

0, 15447

overlaying the initial line! What do you end up with?

0, 1544715447)

Your "problem" is probably best fixed by reencoding that text file of yours to use a better way to separate lines. On Unix systems, including OSX these days, the line terminator is character 10 - known as LINE FEED. Windows uses the two-character combination 13 10 (CR LF). Only old Mac systems to my knowledge used the 13.

Many text editors today will allow you to select a "line ending" option, so you might be able to just open that file, then save it using a different line ending option. FWIW my guess is that you are using Windows now, which is known for rendering CRs and LFs differently than *Nix systems.

What Are All the Common Ways to Read a File in Ruby