What's the Best Way to Parse a Tab-Delimited File in Ruby

What's the best way to parse a tab-delimited file in Ruby?

The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:

require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")

Tab delimited file parsing in Rails

I have had success with FasterCSV and Ruby 1.8.7, I believe it's now the core csv library in 1.9, using this:

table = FasterCSV.read(result_file.to_file.path, { :headers => true, :col_sep => "\t", :skip_blanks => true })
unless table.empty?
header_arry = Array.new
table.headers.each do |h|
#your header logic, e.g.
# if h.downcase.include? 'pos'
# header_arry << 'position'
# end
# simplest case here
header_arry << h.downcase
#which produces an array of column names called header_arry
end

rows = table.to_a
rows.delete_at(0)
rows.each do |row|
#convert to hash using the column names
hash = Hash[header_arry.zip(row)]
# do something with the row hash
end
end

Strategy for reading a tab delimited file and separating file for array with attr_reader

If you make your initialize method accept values for name, hair_color and gender, you can do it like this:

my_array = File.readlines('test.txt').map do |line|
Person.new( *line.split("\t") )
end

If you can't modify your initialize method, you'll need to call the writer methods one by one like this:

my_array = File.readlines('test.txt').map do |line|
name, hair_color, gender = line.split("\t")
person = Person.new
person.name = name
person.hair_color = hair_color
person.gender = gender
person
end

The easiest way to make initialize accept the attributes as argument without having to set all the variables yourself, is to use Struct, which shortens your entire code to:

Person = Struct.new(:name, :hair_color, :gender)
my_array = File.readlines('test.txt').map do |line|
Person.new( *line.split("\t") )
end
#=> [ #<struct Person name="Bob", hair_color="red_hair", gender="male\n">,
# #<struct Person name="Joe", hair_color="brown_hair", gender="male\n">,
# #<struct Person name="John", hair_color="black_hair", gender="male\n">,
# #<struct Person name="\n", hair_color=nil, gender=nil>]

Writing and Reading to/from TAB-delimited CSV files

CSV::open can actually take 3 arguments, the third being the csv options (the same for read) so you can just do:

CSV.open("tipsoutput.csv", "w", col_sep: "\t") do |csv|
csv << ["2017-07-27", "THU", "16:00-22:00", "21.00"]
end

produces the file:

2017-07-27  THU 16:00-22:00 21.00

you will need to iterate through the array, though, as I'm not aware of anything that lets you write multiple arrays (rows) at once, so something like:

CSV.open("tipsoutput.csv", "w", col_sep: "\t") do |csv|
tips.each { |tip| csv << tip }
end

How do I parse a tab-delimited line that contains a quote?

That's a malformed document if you're trying to adhere to the CSV standard. Instad you might just brute-force it and pray there's no tabs in the data itself:

line.split(/\t/)

The CSV parsing library comes in handy when you're dealing with data like this:

"1\t2\t\"3a\t3b\"\t4"

Update: If you're prepared to abuse the CSV library a little then you can do this:

CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "\0")

That basically kills quote detection, so if there is other data that depends on that being processed correctly this may not work out.

Parse tab delimited CSV file to array of hashes in Ruby 2.0

It seems that the options you pass to parse are listed in ::new

>> CSV.parse("qwe\tq\twe", col_sep: "\t"){|a| p a}
["qwe", "q", "we"]

Parse Tab Delimited Text from POST

The easiest way will be to use Ruby's CSV standard library:

require 'csv'

s = "\nuserName\tpassword\tfName\tlName\tuserPhone\tcompName\tcontName\taddr1\taddr2\tcity\tstate\tpostalCode\tcountry\tphone\tfax\temail\tbusnType\tDOTNumber\tMCNumber\ntest\tabc123\tTest\tName\t(555) 555-5555\t\t\t\t\t\t\t58638\tUS\t(555) 555-5555\t(555) 555-5555\t\tTest\t12345678\tMC000000\n"

csv = CSV.new(s, col_sep: "\t")
csv.each do |row|
puts row.inspect
end

And the output is:

[]
["userName", "password", "fName", "lName", "userPhone", "compName", "contName", "addr1", "addr2", "city", "state", "postalCode", "country", "phone", "fax", "email", "busnType", "DOTNumber", "MCNumber"]
["test", "abc123", "Test", "Name", "(555) 555-5555", nil, nil, nil, nil, nil, nil, "58638", "US", "(555) 555-5555", "(555) 555-5555", nil, "Test", "12345678", "MC000000"]

Ruby - Parse a multi-line tab-delimited string into an array of arrays

This ought to do:

expr = /(.+?)\s+\[([^\]]+)\](?:\s+\[([^\]]+)\])?/
str.scan(expr)

The expression is actually a lot less complex than it looks. It looks complex because we're matching square brackets, which have to be escaped, and also using character classes, which are enclosed in square brackets in the regular expression language. All together it adds a lot of noise.

Here it is split up:

expr = /
(.+?) # Capture #1: Any characters (non-greedy)

\s+ # Whitespace
\[ # Literal '['
( # Capture #2:
[^\]]+ # One or more characters that aren't ']'
)
\] # Literal ']'

(?: # Non-capturing group
\s+ # Whitespace
\[ # Literal '['
([^\]]+) # Capture #3 (same as #2)
\] # Literal ']'
)? # Preceding group is optional
/x

As you can see, the third part is identical to the second part, except it's in a non-capture group followed by a ? to make it optional.

It's worth noting that this may fail if e.g. the product name contains square brackets. If that's a possibility, one potential solution is include the version and Installed text in the match, e.g.:

expr = /(.+?)\s+\[(version [^\]]+)\](?:\s+\[(Installed [^\]]+)\])?/

P.S. Here's a solution that uses String#split instead:

expr = /\]?\s+\[|\]$/
res = str.each_line.map {|ln| ln.strip.split(expr) }
.reject {|arr| arr.empty? }

If you have brackets in your product names, a possible workaround here is to specify a minimum number of spaces between parts, e.g.:

expr = /\]?\s{3,}\[|\]$/

...which of course depends on product names never having more than three consecutive spaces.



Related Topics



Leave a reply



Submit