Changing field separator/delimiter in exported CSV using Ruby CSV
Here's an example using a tab instead.
To a file:
CSV.open("myfile.csv", "w", {:col_sep => "\t"}) do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end
To a string:
csv_string = CSV.generate(:col_sep => "\t") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end
Here's the current documentation on CSV: http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
Transpose CSV rows and columns during ETL process using Kiba (or plain Ruby)
Kiba author here!
I see at least two ways of doing this (no matter if you work with plain Ruby or with Kiba):
- converting your HTML to a table, then work from that data
- work directly with the HTML table (using Nokogiri & selectors), applicable only if the HTML is mostly clean
In all cases, because you are doing some scraping; I recommend that you have a very defensive code (because HTML changes and can contain bugs or cornercases later), e.g. strong assertions on the fact that the lines / columns contain what you expect, verifications etc.
If you go plain Ruby, then for instance you could do something like (here modelizing your data as text separated with commas to keep things clear):
task :default do
data = <<DOC
Blocks , Teacher 1 , Teacher 2 , Teacher 3
3:00 pm , Stu A , Stu B ,
3:10 pm , Stu B , , Stu C
DOC
data = data.split("\n").map &->(x) { x.split(",").map(&:strip)}
blocks, *teachers = data.transpose
teachers.each do |teacher|
pp blocks.zip(teacher)
end
end
This will output:
[["Blocks", "Teacher 1"], ["3:00 pm", "Stu A"], ["3:10 pm", "Stu B"]]
[["Blocks", "Teacher 2"], ["3:00 pm", "Stu B"], ["3:10 pm", ""]]
[["Blocks", "Teacher 3"], ["3:00 pm", ""], ["3:10 pm", "Stu C"]]
Something that you can massage into what you expect (but again: be very defensive & put assertions everywhere on all the data, including the number of cells in a table etc, or you'll get off-by-one errors, incorrect schedules etc).
If you want to use Kiba and CSS selectors, you could go like this:
task :default do
html = <<HTML
<table>
<tr>
<th>Blocks</th>
<th>Teacher 1</th>
<th>Teacher 2</th>
<th>Teacher 3</th>
</tr>
<tr>
<td>3:00 pm</td>
<td>Stu A</td>
<td>Stu B</td>
<td></td>
</tr>
<tr>
<td>3:10 pm</td>
<td>Stu B</td>
<td></td>
<td>Stu C</td>
</tr>
</table>
HTML
require 'nokogiri'
require 'kiba'
require 'kiba-common/sources/enumerable'
require 'kiba-common/transforms/enumerable_exploder'
Kiba.run do
# just one doc here, but we could have a sequence instead
source Kiba::Common::Sources::Enumerable, -> { [html] }
transform { |r| Nokogiri::HTML(r) }
transform do |doc|
Enumerator.new do |y|
blocks, *teachers = doc.search("table tr:first th").map(&:text)
# you'd have to add more defensive checks here!!! important!
teachers.each_with_index do |t, i|
headers = doc.search("table>tr>:nth-child(1)").map(&:text)
data = doc.search("table>tr>:nth-child(#{i + 2})").map(&:text)
y << { teacher: t, data: headers.zip(data) }
end
end
end
transform Kiba::Common::Transforms::EnumerableExploder
transform { |r| pp r }
end
end
Which would give:
{:teacher=>"Teacher 1",
:data=>[["Blocks", "Teacher 1"], ["3:00 pm", "Stu A"], ["3:10 pm", "Stu B"]]}
{:teacher=>"Teacher 2",
:data=>[["Blocks", "Teacher 2"], ["3:00 pm", "Stu B"], ["3:10 pm", ""]]}
{:teacher=>"Teacher 3",
:data=>[["Blocks", "Teacher 3"], ["3:00 pm", ""], ["3:10 pm", "Stu C"]]}
I think I would prefer a blend of the 2 methods: first converting the HTML to a proper CSV file or in-memory table, then a second step to transpose from there.
Output array to CSV in Ruby
To a file:
require 'csv'
CSV.open("myfile.csv", "w") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end
To a string:
require 'csv'
csv_string = CSV.generate do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
# ...
end
Here's the current documentation on CSV: http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
Escape Comma from CSV in Ruby
Use Ruby's built-in to_csv
method.
If you haven't already done so, you'll need to require 'csv'
.
Sell Date, Sell Amount
- @rows.each do |row|
= [ row[0], number_to_currency(row[1], :precision => 2) ].to_csv( row_sep: nil ).html_safe
to_csv
is available right on the Array
and does all the escaping you'd expect it to do.
row_sep: nil
prevents the \n
at the end of each row since you're already doing that with each
. Try it without that and you'll see that you get an extra blank line. If you were just generating a single CSV string then you'd need to keep the \n
to separate the rows.
html_safe
prevents the "
characters from showing up in your CSV file.
That should do it!
JP
Related Topics
Nested Object Creation with JSON in Rails
Conditional Key/Value in a Ruby Hash
In Ruby What Does "=>" Mean and How Does It Work
Iconv Deprecation Warning with Ruby 1.9.3
Uninitialized Constant Rake::Dsl in Ruby Gem
Iterating Between Two Datetimes, with a One Hour Step
Create Array of N Items Based on Integer Value
How to Get a Stack Trace Object in Ruby
Case-Insensitive Array#Include
Ruby: Sum Corresponding Members of Two or More Arrays
Class Method VS Constant in Ruby/Rails
Bundle Can't Install Rmagick Gem on MAC Osx 10.7
How to Append a String to a Variable That Either Exists or Not