Fastercsv: Several Separators

FasterCSV: several separators

Solution 1:

One simple way to do it is to let the user select with a drop-down which separator they use in their CSV file, and then you just set that value in the CSV.read() call. But I guess you want it automatic. :-)

Solution 2:

You can read-in the first line of the CSV file with regular File.read() and analyze it by matching the first line against /,/ and then against /\t/ ... depending on which RegExp matches, you select the separator in the CSV.read() call to the according (single) separator. Then you read in the file with CSV.read(..., :col_sep => single_separator ) accordingly.

But Beware:

At first it looks nice and elegant to want to use ",\t" as the separator in the method call to allow both -- but please note this would introduce a possible nasty bug!

If a CVS file would contain both tabs and commas by accident or by chance ... what do you do then?
Separate on both? How can you be sure? I think that would be a mistake, because CSV separators don't appear "mixed" like this in regular CSV files -- it's always either ',' or "\t"

So I think you should not use ",\t" -- that could be causing huge problems, and that's probably the reason why they did not implement / allow the col_sep option to accept a RegExp.

Rails: Use more than 1 col_sep

col_sep only accepts one value. You can see examples of how it's used here:

http://rxr.whitequark.org/mri/source/lib/csv.rb
(lines 1654 and 1803 are a couple examples)

One workaround could be replacing all instances of one separator value with another by using something like gsub. Not the silver bullet you were hoping for, but depending on your requirements it could do the trick!

FasterCSV default options and their usage

FasterCSV has replaced the former CSV module in the standard library and is since then renamed to 'CSV'. Have a look at the new method for the options.

How do I split treat a string (not a file) as a line in a CSV file and parse the string?

If you have arbitrary string data you want to parse as CSV you can just use parse. No need for a temporary file:

require 'csv'

commas = %Q[a,b,"c,d"]

CSV.parse(commas)
# => [["a", "b", "c,d"]]

tabs = %Q[a\tb\t"c\td"]

CSV.parse(tabs, col_sep: "\t")
# => [["a", "b", "c\td"]]

The col_sep option allows you to specify what separator is used.

How parse the data from TXT file with tab separator?

Here's one way to do it. We go to lower level, using shift to parse each row and then silent the MalformedCSVError exception, continuing with the next iteration. The problem with this is the loop doesn't look so nice. If anyone can improve this, you're welcome to edit the code.

FasterCSV.open(filename, :quote_char => '"', :col_sep => "\t", :headers => true) do |csv|
row = true
while row
begin
row = csv.shift
break unless row

# Do things with the row here...
rescue FasterCSV::MalformedCSVError
next
end
end
end


Related Topics



Leave a reply



Submit