What's the best way to parse a tab-delimited file in Ruby?
The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:
require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")
Tab delimited file parsing in Rails
I have had success with FasterCSV and Ruby 1.8.7, I believe it's now the core csv library in 1.9, using this:
table = FasterCSV.read(result_file.to_file.path, { :headers => true, :col_sep => "\t", :skip_blanks => true })
unless table.empty?
header_arry = Array.new
table.headers.each do |h|
#your header logic, e.g.
# if h.downcase.include? 'pos'
# header_arry << 'position'
# end
# simplest case here
header_arry << h.downcase
#which produces an array of column names called header_arry
end
rows = table.to_a
rows.delete_at(0)
rows.each do |row|
#convert to hash using the column names
hash = Hash[header_arry.zip(row)]
# do something with the row hash
end
end
Strategy for reading a tab delimited file and separating file for array with attr_reader
If you make your initialize
method accept values for name
, hair_color
and gender
, you can do it like this:
my_array = File.readlines('test.txt').map do |line|
Person.new( *line.split("\t") )
end
If you can't modify your initialize
method, you'll need to call the writer methods one by one like this:
my_array = File.readlines('test.txt').map do |line|
name, hair_color, gender = line.split("\t")
person = Person.new
person.name = name
person.hair_color = hair_color
person.gender = gender
person
end
The easiest way to make initialize accept the attributes as argument without having to set all the variables yourself, is to use Struct
, which shortens your entire code to:
Person = Struct.new(:name, :hair_color, :gender)
my_array = File.readlines('test.txt').map do |line|
Person.new( *line.split("\t") )
end
#=> [ #<struct Person name="Bob", hair_color="red_hair", gender="male\n">,
# #<struct Person name="Joe", hair_color="brown_hair", gender="male\n">,
# #<struct Person name="John", hair_color="black_hair", gender="male\n">,
# #<struct Person name="\n", hair_color=nil, gender=nil>]
Writing and Reading to/from TAB-delimited CSV files
CSV::open
can actually take 3 arguments, the third being the csv options (the same for read
) so you can just do:
CSV.open("tipsoutput.csv", "w", col_sep: "\t") do |csv|
csv << ["2017-07-27", "THU", "16:00-22:00", "21.00"]
end
produces the file:
2017-07-27 THU 16:00-22:00 21.00
you will need to iterate through the array, though, as I'm not aware of anything that lets you write multiple arrays (rows) at once, so something like:
CSV.open("tipsoutput.csv", "w", col_sep: "\t") do |csv|
tips.each { |tip| csv << tip }
end
How do I parse a tab-delimited line that contains a quote?
That's a malformed document if you're trying to adhere to the CSV standard. Instad you might just brute-force it and pray there's no tabs in the data itself:
line.split(/\t/)
The CSV parsing library comes in handy when you're dealing with data like this:
"1\t2\t\"3a\t3b\"\t4"
Update: If you're prepared to abuse the CSV library a little then you can do this:
CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "\0")
That basically kills quote detection, so if there is other data that depends on that being processed correctly this may not work out.
Parse tab delimited CSV file to array of hashes in Ruby 2.0
It seems that the options you pass to parse are listed in ::new
>> CSV.parse("qwe\tq\twe", col_sep: "\t"){|a| p a}
["qwe", "q", "we"]
Parse Tab Delimited Text from POST
The easiest way will be to use Ruby's CSV standard library:
require 'csv'
s = "\nuserName\tpassword\tfName\tlName\tuserPhone\tcompName\tcontName\taddr1\taddr2\tcity\tstate\tpostalCode\tcountry\tphone\tfax\temail\tbusnType\tDOTNumber\tMCNumber\ntest\tabc123\tTest\tName\t(555) 555-5555\t\t\t\t\t\t\t58638\tUS\t(555) 555-5555\t(555) 555-5555\t\tTest\t12345678\tMC000000\n"
csv = CSV.new(s, col_sep: "\t")
csv.each do |row|
puts row.inspect
end
And the output is:
[]
["userName", "password", "fName", "lName", "userPhone", "compName", "contName", "addr1", "addr2", "city", "state", "postalCode", "country", "phone", "fax", "email", "busnType", "DOTNumber", "MCNumber"]
["test", "abc123", "Test", "Name", "(555) 555-5555", nil, nil, nil, nil, nil, nil, "58638", "US", "(555) 555-5555", "(555) 555-5555", nil, "Test", "12345678", "MC000000"]
Ruby - Parse a multi-line tab-delimited string into an array of arrays
This ought to do:
expr = /(.+?)\s+\[([^\]]+)\](?:\s+\[([^\]]+)\])?/
str.scan(expr)
The expression is actually a lot less complex than it looks. It looks complex because we're matching square brackets, which have to be escaped, and also using character classes, which are enclosed in square brackets in the regular expression language. All together it adds a lot of noise.
Here it is split up:
expr = /
(.+?) # Capture #1: Any characters (non-greedy)
\s+ # Whitespace
\[ # Literal '['
( # Capture #2:
[^\]]+ # One or more characters that aren't ']'
)
\] # Literal ']'
(?: # Non-capturing group
\s+ # Whitespace
\[ # Literal '['
([^\]]+) # Capture #3 (same as #2)
\] # Literal ']'
)? # Preceding group is optional
/x
As you can see, the third part is identical to the second part, except it's in a non-capture group followed by a ?
to make it optional.
It's worth noting that this may fail if e.g. the product name contains square brackets. If that's a possibility, one potential solution is include the version
and Installed
text in the match, e.g.:
expr = /(.+?)\s+\[(version [^\]]+)\](?:\s+\[(Installed [^\]]+)\])?/
P.S. Here's a solution that uses String#split
instead:
expr = /\]?\s+\[|\]$/
res = str.each_line.map {|ln| ln.strip.split(expr) }
.reject {|arr| arr.empty? }
If you have brackets in your product names, a possible workaround here is to specify a minimum number of spaces between parts, e.g.:
expr = /\]?\s{3,}\[|\]$/
...which of course depends on product names never having more than three consecutive spaces.
Related Topics
How to Get the Width of Terminal Window in Ruby
Ruby Replace String with Captured Regex Pattern
How to Force Activerecord to Reload a Class
Rails: Respond_To JSON and HTML
Errno::Enoent: No Such File or Directory Ruby
Regex with Named Capture Groups Getting All Matches in Ruby
How to Make a Rake Task Run After All Other Tasks? (I.E. a Rake Afterbuild Task)
How to Mix a Module into an Rspec Context
How to Sort a Hash by Value in Descending Order and Output a Hash in Ruby
Uninstall Ruby Version from Rbenv
Rails - How to Check Developer Mode or Production Mode in Code
How to Put Assertions in Ruby Code
How to Call Applicationcontroller Methods from Applicationhelper
Paginate Multiple Models in Kaminari
Should I Use Class Method or Instance Method, and Why
Difference Between Add_Dependency and Add_Runtime_Dependency