What are all the common ways to read a file in Ruby?
File.open("my/file/path", "r") do |f|
f.each_line do |line|
puts line
end
end
# File is closed automatically at end of block
It is also possible to explicitly close file after as above (pass a block to open
closes it for you):
f = File.open("my/file/path", "r")
f.each_line do |line|
puts line
end
f.close
How to read lines of a file in Ruby
I believe my answer covers your new concerns about handling any type of line endings since both "\r\n"
and "\r"
are converted to Linux standard "\n"
before parsing the lines.
To support the "\r"
EOL character along with the regular "\n"
, and "\r\n"
from Windows, here's what I would do:
line_num=0
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end
Of course this could be a bad idea on very large files since it means loading the whole file into memory.
How to open and read a file in one line in ruby
File.read("/path/to/file")
It will read whole file content and return it as a result.
file read in ruby getting output as spaces in character
If you binread, essentially you have UTF-8 characters in between
irb(main):013:0> f = File.binread('f2.txt')
=> "1,S1\xC2\xAD-88,S2\xC2\xAD-53,S3\xC2\xAD-69,S4\xC2\xAD-64"
\xC2\xAD
are essentially whitespace characters
This may be because you have copied it from somewhere incorrectly or it was introduced in your text because of God. Don't know. You an check here, it shows there are hidden characters in between your text.
This will remove all characters not wanted.
File.foreach('f2.txt') do |f|
puts f.gsub(/[^\\s!-~]/, '')
end
=> 1,S1-88,S2-53,S3-69,S4-64
Read file and write file
If you want to make your code work, change:
aFile = File.new("mydata.txt", "w")
to:
aFile = File.new("mydata.txt", "r+")
You can change:
count = aFile.gets
if (is_numeric(count))
to:
count = aFile.gets.to_i
if (count.is_a?(Fixnum))
and then get rid of the is_numeric?(obj)
method.
Also you're not incrementing the counter, you can fix that as well.
Read any file with extension .txt inside a specified folder in ruby every 2 minutes
You could do something like:
Dir.foreach('/path-to-your-files') do |item|
next unless File.extname(item) == '.txt'
next if File.directory? item
file = File.read(item)
# do what you want with the file
File.delete(item) if File.exist?(item)
end
As @engineersmnky says, you can simplify this a little using Dir.glob
, as in:
Dir.glob('/path/to/files/*.txt') do
# the rest, without the code to skip irrelevant files
end
That should do what you're after. Let me know how you get on!
Edit - to do this every two minutes, you'll have to use a reoccurring job. Sorry, slipped my mind when writing the question and I've got to go for now. Hopefully someone else'll cover this while I'm away, and I'll delete / update when I'm back online!
And to update upon this - the best way to get this running every two minutes will be to use a cron
task, and the whatever gem makes this a breeze. After setting it up, you can use, for example:
every 2.minutes do
# run the code above
end
What is the most performant way of processing this large text file?
You need to run a benchmark test, using Ruby's built-in Benchmark to figure out what is your fastest choice.
However, from experience, I've found that "slurping" the file, i.e., reading it all in at once, is not any faster than using a loop with IO.foreach
or File.foreach
. This is because Ruby and the underlying OS do file buffering as the reads occur, allowing your loop to occur from memory, not directly from disk. foreach
will not strip the line-terminators for you, like split
would, so you'll need to add a chomp
or chomp!
if you want to mutate the line read in:
File.foreach('/path/to/file') do |li|
puts li.chomp
end
or
File.foreach('/path/to/file') do |li|
li.chomp!
puts li
end
Also, slurping has the problem of not being scalable; You could end up trying to read a file bigger than memory, taking your machine to its knees, while reading line-by-line will never do that.
Here's some performance numbers:
#!/usr/bin/env ruby
require 'benchmark'
require 'fileutils'
FILENAME = 'test.txt'
LOOPS = 1
puts "Ruby Version: #{RUBY_VERSION}"
puts "Filesize being read: #{File.size(FILENAME)}"
puts "Lines in file: #{`wc -l #{FILENAME}`.split.first}"
Benchmark.bm(20) do |x|
x.report('read.split') { LOOPS.times { File.read(FILENAME).split("\n") }}
x.report('read.lines.chomp') { LOOPS.times { File.read(FILENAME).lines.map(&:chomp) }}
x.report('readlines.map.chomp1') { LOOPS.times { File.readlines(FILENAME).map(&:chomp) }}
x.report('readlines.map.chomp2') { LOOPS.times { File.readlines(FILENAME).map{ |s| s.chomp } }}
x.report('foreach.map.chomp1') { LOOPS.times { File.foreach(FILENAME).map(&:chomp) }}
x.report('foreach.map.chomp2') { LOOPS.times { File.foreach(FILENAME).map{ |s| s.chomp } }}
end
And the results:
Ruby Version: 1.9.3
Filesize being read: 42026131
Lines in file: 465440
user system total real
read.split 0.150000 0.060000 0.210000 ( 0.213365)
read.lines.chomp 0.470000 0.070000 0.540000 ( 0.541266)
readlines.map.chomp1 0.450000 0.090000 0.540000 ( 0.535465)
readlines.map.chomp2 0.550000 0.060000 0.610000 ( 0.616674)
foreach.map.chomp1 0.580000 0.060000 0.640000 ( 0.641563)
foreach.map.chomp2 0.620000 0.050000 0.670000 ( 0.662912)
On today's machines a 42MB file can be read into RAM pretty safely. I have seen files a lot bigger than that which won't fit into the memory of some of our production hosts. While foreach
is slower, it's also not going to take a machine to its knees by sucking up all memory if there isn't enough memory.
On Ruby 1.9.3, using the map(&:chomp)
method, instead of the older form of map { |s| s.chomp }
, is a lot faster. That wasn't true with older versions of Ruby, so caveat emptor.
Also, note that all the above processed the data in less than one second on my several years old Mac Pro. All in all I'd say that worrying about the load speed is premature optimization, and the real problem will be what is done after the data is loaded.
Reading a text file in Ruby gives wrong output
What happened is that your file had two "lines" separated by a carraige return character, and not a linefeed.
You showed the bytes in your file as
116 114 105 109 40 48 44 32 49 53 52 52 55 41 13 48 44 32 49 53 52 52 55
That 13 is a carriage return, which is sometimes "displayed" by the writer going back to the start of the line it is writing.
So first it wrote out
trim(0, 15447)
then it went back to the start of the same line and wrote
0, 15447
overlaying the initial line! What do you end up with?
0, 1544715447)
Your "problem" is probably best fixed by reencoding that text file of yours to use a better way to separate lines. On Unix systems, including OSX these days, the line terminator is character 10 - known as LINE FEED. Windows uses the two-character combination 13 10 (CR LF). Only old Mac systems to my knowledge used the 13.
Many text editors today will allow you to select a "line ending" option, so you might be able to just open that file, then save it using a different line ending option. FWIW my guess is that you are using Windows now, which is known for rendering CRs and LFs differently than *Nix systems.
Related Topics
What Do 'I' and '-I' in Regex Mean
Heroku - Cannot Run Git Push Heroku Master
Sort an Array According to the Elements of Another Array
How to Keep the Delimiters When Splitting a Ruby String
Library Not Loaded: /Opt/Local/Lib/Libssl.1.0.0.Dylib (Loaderror)
How to Pass Command Line Arguments to a Rake Task
Installed Ruby 1.9.3 With Rvm But Command Line Doesn't Show Ruby -V
Difference Between Require_Relative and Require in Ruby
Does Ruby Have Real Multithreading
What Is Ruby'S Double-Colon '::'
Incompatible Character Encodings: Ascii-8Bit and Utf-8
Mixing Keyword With Regular Arguments in Ruby
Best Explanation of Ruby Blocks
What Does 'Monkey Patching' Exactly Mean in Ruby
How to Validate a Date in Rails
Disable Activerecord For Rails 4
Limitations in Running Ruby/Rails on Windows
How to Find Which Operating System My Ruby Program Is Running On