Ruby - Parsing a Text File

Parsing and structuring of a text file

If the answer to @mudasobwa question "Do you want to grab everything having 88 value?" this is the solution

lines = File.open("file.txt").to_a
lines.map!(&:chomp) # remove line breaks

current_head = ""
res = []

lines.each do |line|
case line
when /Head \d+/
current_head = line
when /\w{1} 88/
res << "#{current_head}, #{line}"
end
end

puts res

Parsing a structured text file in Ruby

You should look for the indicator lines (description, quality, text and stats) in a loop and fill the hash while processing the document line by line.

Another option would be to use regular expressions and parse the document at once, but you don't really need regular expressions here, and if you're not familiar with them, I'd have to recommend against regexes.

UPDATE:

sections = []

File.open("deneme") do |f|
current = {:description => "", :text => "", :quality => "", :stats => ""}
inDescription = false
inQuality = false

f.each_line do |line|
if inDescription
if line.strip == ""
inDescription = false
else
current[:description] += line
end
elsif inQuality
current[:quality] = line.strip
inQuality = false
elsif line.strip == "description"
inDescription = true
elsif line.strip == "quality"
inQuality = true
elsif line.match(/^text: /)
current[:text] = line[6..-1].strip
elsif line.match(/^stats /)
current[:stats] = line[6..-1].strip
sections.push(current)
current = {:description => "", :text => "", :quality => "", :stats => ""}
end
end
end

[ruby]Get file, parse text and create date object

Something like this

holidays = File.read('holidays.txt').split(/\n/).map do |row| 
date, holiday_name = row.split(';')
date = Date.parse(date, '%d.%m.%Y')
[date, holiday_name]
end.to_h
=> {
#<Date: 2017-01-01 ((2457755j,0s,0n),+0s,2299161j)> => "New Year",
#<Date: 2017-04-16 ((2457860j,0s,0n),+0s,2299161j)> => "Easter",
#<Date: 2017-12-25 ((2458113j,0s,0n),+0s,2299161j)> => "Christmas"
}

Parsing lines of text from external file in Ruby

Try this

raw_email = File.open("sample-email.txt", "r")
parsed_email = {}

raw_email.each do |line|
case line.split(":")[0]
when "Delivered-To"
parsed_email[:to] = line
when "From"
parsed_email[:from] = line
when "Date"
parsed_email[:date] = line
when "Subject"
parsed_email[:subject] = line
end
end

puts parsed_email
=> {:to=>"Delivered-To: user1@example.com\n", :from=>"From: John Doe <user2@example.com>\n", :date=>"Date: Tue, 12 Dec 2017 13:30:14 -0500\n", :subject=>"Subject: Testing the parser\n"}

Explanation
You need to split line on : and select first. Like this line.split(":")[0]

How to parse a text file containing multiple lines of data and organized by numerical values and then convert to JSON

This is a very common type of encoding called Type-Length-Value (or Tag-Length-Value), for reasons I suppose are obvious. As with many such tasks in Ruby, String#unpack is a good fit:

def decode(data)
return {} if data.empty?
key, len, rest = data.unpack("a2 a2 a*")
val = rest.slice!(0, len.to_i)
{ key => val }.merge(decode(rest))
end

p decode("HD040008000415350110XXXXXXXXXX0208XXXXXXXX0302EN0403USA0502EN0604000107014")
# => {"HD"=>"0008", "00"=>"1535", "01"=>"XXXXXXXXXX", "02"=>"XXXXXXXX", "03"=>"EN", "04"=>"USA", "05"=>"EN", "06"=>"0001", "07"=>"4"}

p decode("EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.0614565 5th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA")
# => {"EM"=>"0008", "00"=>"001", "01"=>"TME001205IQ5", "02"=>"Blue Point Coastal Cuisine. INC.", "06"=>"565 5th Avenue", "08"=>"92101", "09"=>"SAN DIEGO", "10"=>"Downtown", "11"=>"CA", "12"=>"USA"}

If you want to read an entire file and return a JSON array of objects, something like this would suffice:

#!/usr/bin/env ruby -n
BEGIN {
require "json"
def decode(data)
# ...
end
arr = []
}

arr << decode($_.chomp)

END { puts arr.to_json }

Then (supposing the script is called script.rb and is executable:

$ cat data.txt | ./script.rb > out.json

Parsing a .txt file to key/value pairs in Ruby

You can do

array = []
# open the file in read mode. With block version you don'r need to
# worry about to close the file by hand. It will be closed when the
# read operation will be completed.
File.open('path/to/file', 'r') do |file|
# each_line gives an Enumerator object. On which I'm calling
# each_slice to take 2 lines at a time, where first line is the
# question, and the second one is the answer.
file.each_line.each_slice(2).do |question, answer|
array << {'Question' => question, 'Answer' => answer}
end
end


Related Topics



Leave a reply



Submit