Changing Field Separator/Delimiter in Exported CSV Using Ruby CSV

Changing field separator/delimiter in exported CSV using Ruby CSV

Here's an example using a tab instead.

To a file:

CSV.open("myfile.csv", "w", {:col_sep => "\t"}) do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end

To a string:

csv_string = CSV.generate(:col_sep => "\t") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end

Here's the current documentation on CSV: http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html

Transpose CSV rows and columns during ETL process using Kiba (or plain Ruby)

Kiba author here!

I see at least two ways of doing this (no matter if you work with plain Ruby or with Kiba):

  • converting your HTML to a table, then work from that data
  • work directly with the HTML table (using Nokogiri & selectors), applicable only if the HTML is mostly clean

In all cases, because you are doing some scraping; I recommend that you have a very defensive code (because HTML changes and can contain bugs or cornercases later), e.g. strong assertions on the fact that the lines / columns contain what you expect, verifications etc.

If you go plain Ruby, then for instance you could do something like (here modelizing your data as text separated with commas to keep things clear):

task :default do
data = <<DOC
Blocks , Teacher 1 , Teacher 2 , Teacher 3
3:00 pm , Stu A , Stu B ,
3:10 pm , Stu B , , Stu C
DOC
data = data.split("\n").map &->(x) { x.split(",").map(&:strip)}
blocks, *teachers = data.transpose
teachers.each do |teacher|
pp blocks.zip(teacher)
end
end

This will output:

[["Blocks", "Teacher 1"], ["3:00 pm", "Stu A"], ["3:10 pm", "Stu B"]]
[["Blocks", "Teacher 2"], ["3:00 pm", "Stu B"], ["3:10 pm", ""]]
[["Blocks", "Teacher 3"], ["3:00 pm", ""], ["3:10 pm", "Stu C"]]

Something that you can massage into what you expect (but again: be very defensive & put assertions everywhere on all the data, including the number of cells in a table etc, or you'll get off-by-one errors, incorrect schedules etc).

If you want to use Kiba and CSS selectors, you could go like this:

task :default do
html = <<HTML
<table>
<tr>
<th>Blocks</th>
<th>Teacher 1</th>
<th>Teacher 2</th>
<th>Teacher 3</th>
</tr>
<tr>
<td>3:00 pm</td>
<td>Stu A</td>
<td>Stu B</td>
<td></td>
</tr>
<tr>
<td>3:10 pm</td>
<td>Stu B</td>
<td></td>
<td>Stu C</td>
</tr>
</table>
HTML
require 'nokogiri'
require 'kiba'
require 'kiba-common/sources/enumerable'
require 'kiba-common/transforms/enumerable_exploder'
Kiba.run do
# just one doc here, but we could have a sequence instead
source Kiba::Common::Sources::Enumerable, -> { [html] }

transform { |r| Nokogiri::HTML(r) }

transform do |doc|
Enumerator.new do |y|
blocks, *teachers = doc.search("table tr:first th").map(&:text)
# you'd have to add more defensive checks here!!! important!
teachers.each_with_index do |t, i|
headers = doc.search("table>tr>:nth-child(1)").map(&:text)
data = doc.search("table>tr>:nth-child(#{i + 2})").map(&:text)
y << { teacher: t, data: headers.zip(data) }
end
end
end

transform Kiba::Common::Transforms::EnumerableExploder

transform { |r| pp r }
end
end

Which would give:

{:teacher=>"Teacher 1",
:data=>[["Blocks", "Teacher 1"], ["3:00 pm", "Stu A"], ["3:10 pm", "Stu B"]]}
{:teacher=>"Teacher 2",
:data=>[["Blocks", "Teacher 2"], ["3:00 pm", "Stu B"], ["3:10 pm", ""]]}
{:teacher=>"Teacher 3",
:data=>[["Blocks", "Teacher 3"], ["3:00 pm", ""], ["3:10 pm", "Stu C"]]}

I think I would prefer a blend of the 2 methods: first converting the HTML to a proper CSV file or in-memory table, then a second step to transpose from there.

Output array to CSV in Ruby

To a file:

require 'csv'
CSV.open("myfile.csv", "w") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end

To a string:

require 'csv'
csv_string = CSV.generate do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end

Here's the current documentation on CSV: http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html

Escape Comma from CSV in Ruby




Use Ruby's built-in to_csv method.

If you haven't already done so, you'll need to require 'csv'.

Sell Date, Sell Amount
- @rows.each do |row|
= [ row[0], number_to_currency(row[1], :precision => 2) ].to_csv( row_sep: nil ).html_safe

to_csv is available right on the Array and does all the escaping you'd expect it to do.

row_sep: nil prevents the \n at the end of each row since you're already doing that with each. Try it without that and you'll see that you get an extra blank line. If you were just generating a single CSV string then you'd need to keep the \n to separate the rows.

html_safe prevents the " characters from showing up in your CSV file.

That should do it!

JP



Related Topics



Leave a reply



Submit