How to Remove Lines of Data in the Middle of a Text File with Ruby

How do I remove lines of data in the middle of a text file with Ruby

You can delete a line in a several ways:

  • Simulate deletion. That is, just overwrite line's content with spaces. Later, when you read and process the file, just ignore such empty lines.

    Pros: this is easy and fast. Cons: it's not real deletion of data (file doesn't shrink) and you need to do more work when reading/processing the file.

    Code:

    f = File.new(filename, 'r+')
    f.each do |line|
    if should_be_deleted(line)
    # seek back to the beginning of the line.
    f.seek(-line.length, IO::SEEK_CUR)

    # overwrite line with spaces and add a newline char
    f.write(' ' * (line.length - 1))
    f.write("\n")
    end
    end
    f.close

    File.new(filename).each {|line| p line }

    # >> "Person1,will,23\n"
    # >> " \n"
    # >> "Person3,Mike,44\n"
  • Do real deletion. This means that line will no longer exist. So you will have to read next line and overwrite the current line with it. Then repeat this for all following lines until the end of file is reached. This seems to be error prone task (lines of different lengths, etc), so here's an error-free alternative: open temp file, write to it lines up to (but not including) the line you want to delete, skip the line you want to delete, write the rest to the temp file. Delete the original file and rename temporary one to use its name. Done.

    While this is technically a total rewrite of the file, it does differ from what you asked. The file doesn't need to be loaded fully to memory. You need only one line at a time. Ruby provides a method for this: IO#each_line.

    Pros: No assumptions. Lines get deleted. Reading code needs not to be altered. Cons: lots more work when deleting the line (not only the code, but also IO/CPU time).

    There is a snippet that illustrates this approach in @azgult's answer.

How to delete specific lines in text file?

Deleting lines cleanly and efficiently from a text file is "difficult" in the general case, but can be simple if you can constrain the problem somewhat.

Here are some questions from SO that have asked a similar question:

  • How do I remove lines of data in the middle of a text file with Ruby
  • Deleting a specific line in a text file?
  • Deleting a line in a text file
  • Delete a line of information from a text file

There are numerous others, as well.

In your case, if your input file is relatively small, you can easily afford to use the approach that you're using. Really, the only thing that would need to change to meet your criteria is to modify your input file loop and condition to this:

File.open('output.txt', 'w') do |out_file|
File.foreach('input.txt').with_index do |line,line_number|
out_file.puts line if line_number.even? # <== line numbers start at 0
end
end

The changes are to capture the line number, using the with_index method, which can be used due to the fact that File#foreach returns an Enumerator when called without a block; the block now applies to with_index, and gains the line number as a second block argument. Simply using the line number in your comparison gives you the criteria that you specified.

This approach will scale, even for somewhat large files, whereas solutions that read the entire file into memory have a fairly low upper limit on file size. With this solution, you're more constrained by available disk space and speed at which you can read/write the file; for instance, doing this to space-limited online storage may not work as well as you'd like. Writing to local disk or thumb drive, assuming that you have space available, should be no problem at all.

Deleting a specific line in a text file?

I think you can't do that safely because of file system limitations.

If you really wanna do a inplace editing, you could try to write it to memory, edit it, and then replace the old file. But beware that there's at least two problems with this approach. First, if your program stops in the middle of rewriting, you will get an incomplete file. Second, if your file is too big, it will eat your memory.

file_lines = ''

IO.readlines(your_file).each do |line|
file_lines += line unless <put here your condition for removing the line>
end

<extra string manipulation to file_lines if you wanted>

File.open(your_file, 'w') do |file|
file.puts file_lines
end

Something along those lines should work, but using a temporary file is a much safer and the standard approach

require 'fileutils'

File.open(output_file, "w") do |out_file|
File.foreach(input_file) do |line|
out_file.puts line unless <put here your condition for removing the line>
end
end

FileUtils.mv(output_file, input_file)

Your condition could be anything that showed it was the unwanted line, like, file_lines += line unless line.chomp == "aaab" for example, would remove the line "aaab".

Remove line of text from file on ruby

What about something like this?

lines = File.readlines('file.txt')

random_line = lines.shuffle.pop

File.open('file.txt', 'w') do |f|
f.write(lines.join(''))
end

File.open('random.txt', 'a') do |f|
f.write(random_line)
end

Note that readlines has the effect of reading the whole file into memory, but it also means you get a truly random sample from the file. Your implementation is probably biased more heavily toward the end of the file since you do not know how many lines there are in advance.

As with anything that does manipulation in this way, there is a small chance that the file might be truncated if this program is halted unexpectedly. The usual method to avoid this is to write to a temporary file, then rename when that's successful. A better alternative is to use a database, even an embedded one like SQLite.

How to delete lines from multiple files

There are a number of things wrong with your code, and you're not safely handling your file changes.

Meditate on this untested code:

ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")

File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')

ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"

File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end

File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
  • In your code you use:

    File.open('D:\\mywork\\list.txt').read

    instead, a shorter, and more concise and clear way would be to use:

    File.read('D:/mywork/list.txt')

    Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:

Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".

The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.

Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for

    value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
  • While

    Dir.glob("D:/new_work/*-access.txt") do |fn|

    is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.

  • Again,

    text = File.read(fn)

    has scalability issues. Using foreach is a better solution. Again.

  • Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:

    replace = text.gsub(line.strip, "")
  • Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:

    File.open(fn, "w") { |file| file.puts replace }

    A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.

A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.

Removing lines in a text file based on the beginning characters

Try

message_filtered = message_txt.lines.reject { |line|
line[0] == '>' || line =~ YOUR_EMAIL_REGEXP
}.join('\n')

To remove lines that start with > you can use:

message_filtered = message_txt.gsub(/(^>.+)/, '') # should work but not tested

Count the number of lines in a file without reading entire file into memory?

If you are in a Unix environment, you can just let wc -l do the work.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt'
line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i
p line_count

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"`
line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i
p line_count


Related Topics



Leave a reply



Submit