Ruby CSV - get current line/row number
Because of changes in CSV in current Rubies, we need to make some changes. See farther down in the answer for the original solution with Ruby prior to 2.6. and the use of with_index
which continues to work regardless of the version.
For 2.6+ this'll work:
require 'csv'
puts RUBY_VERSION
csv_file = CSV.open('test.csv')
csv_file.each do |csv_row|
puts '%i %s' % [csv_file.lineno, csv_row]
end
csv_file.close
If I read:
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.00
The code results in this output:
2.6.3
1 ["Year", "Make", "Model", "Description", "Price"]
2 ["1997", "Ford", "E350", "ac, abs, moon", "3000.00"]
3 ["1999", "Chevy", "Venture \"Extended Edition\"", "", "4900.00"]
4 ["1999", "Chevy", "Venture \"Extended Edition, Very Large\"", "", "5000.00"]
5 ["1996", "Jeep", "Grand Cherokee", "MUST SELL!\\nair, moon roof, loaded", "4799.00"]
The change is because we have to get access to the current file handle. Previously we could use the global $.
, which always had a possibility of failure because globals can get stomped on by other sections of called code. If we have the handle of the file being opened, then we can use lineno
without that concern.
$.
Ruby prior to 2.6 would let us do this:
Ruby has a magic variable $.
which is the line number of the current file being read:
require 'csv'
CSV.foreach('test.csv') do |csv|
puts $.
end
with the code above, I get:
1
2
3
4
5
$INPUT_LINE_NUMBER
$.
is used all the time in Perl. In Ruby, it's recommended we use it the following way to avoid the "magical" side of it:
require 'english'
puts $INPUT_LINE_NUMBER
If it's necessary to deal with embedded line-ends in fields, it's easily handled by a minor modification. Assuming a CSV file "test.csv" which contains a line with an embedded new-line:
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
with_index
Using Enumerator's with_index(1)
makes it easy to keep track of the number of times CSV yields to the block, effectively simulating using $.
but honoring CSV's work when reading the extra lines necessary to deal with the line-ends:
require 'csv'
CSV.foreach('test.csv', headers: true).with_index(1) do |row, ln|
puts '%-3d %-5s %-26s %s' % [ln, *row.values_at('Make', 'Model', 'Description')]
end
Which, when run, outputs:
$ ruby test.rb
1 Ford E350 ac, abs, moon
2 Chevy Venture "Extended Edition"
3 Jeep Grand Cherokee MUST SELL!
air, moon roof, loaded
4 Chevy Venture "Extended Edition, Very Large"
How to select specific rows in CSV files?
You can use the CSV.foreach
method to iterate over the CSV and the with_index
method to count the rows you read and skip rows you don't want to process. For example:
require 'csv'
CSV.foreach(file, headers: true).with_index(1) do |row, rowno|
next if rowno < 5 # skips first four rows
# process the row
end
In Ruby 1.9.3 this wouldn't work since foreach
doesen't return an Enumerator
if no block is given. The code can be modified like this:
CSV.to_enum(:foreach, file, headers: true).with_index(1) do |row, rowno|
# ...
end
Not getting correct value of each line in CSV file
Setting :headers => true
in new
causes foreach
to yield a CSV::Row
object for each row instead of a plain Ruby array. See the new
documentation for more information.
To access the actual row fields, you need to use row.fields
:
CSV.foreach(filename, :headers => true) do |row|
name, age = row.fields
puts name
end
I'm not aware of a way to skip headers, use CSV.foreach
, and still get a plain array for each row.
count number of elements in a CSV in a row in Ruby
If each line contains same number of elements then:
CSV.open('test.csv', 'r') { |csv| puts csv.first.length }
If not then count for each line:
CSV.foreach('test.csv', 'r') { |row| puts r.length }
How to detect the last row in CSV (ruby)
File.open(file_path) do |file|
file.each_line do |line|
row = CSV.parse_line(line.scrub(""), col_sep: "\t", headers: headers, quote_char: '_')
file.eof?
end
end
I went with this solution it does not require loading the entire CSV file prior to looping through which is helpful when dealing with large files.
By using File.open I can call file.eof? (end of file) which lets me know when I hit the last line.
CSV iteration in Ruby, and grouping by column value to get last line of each group
Let's first construct a CSV file.
str =<<~END
ID,Name,Transaction Value,Running Total
5,mike,5,5
5,mike,2,7
20,bob,1,1
20,bob,15,16
1,jane,4,4
END
CSVFile = 't.csv'
File.write(CSVFile, str)
#=> 107
I will first create a method that takes two arguments: an instance of CSV::row and a boolean to indicate whether the CSV row is the last of the group (true
if it is).
def process_row(row, is_last)
puts "Do something with row #{row}"
puts "last row: #{is_last}"
end
This method would of course be modified to perform whatever operations need be performed for each row.
Below are three ways to process the file. All three use the method CSV::foreach to read the file line-by-line. This method is called with two arguments, the file name and an options hash { header: true, converters: :numeric }
that indicates that the first line of the file is a header row and that strings representing numbers are to be converted to the appropriate numeric object. Here values for "ID"
, "Transaction Value"
and "Running Total"
will be converted to integers.
Though it is not mentioned in the doc, when foreach
is called without a block it returns an enumerator (in the same way that IO::foreach does).
We of course need:
require 'csv'
Chain foreach
to Enumerable#chunk
I have chosen to use chunk
, as opposed to Enumerable#group_by, because the lines of the file are already grouped by ID
.
CSV.foreach(CSVFile, headers:true, converters: :numeric).
chunk { |row| row['ID'] }.
each do |_,(*arr, last_row)|
arr.each { |row| process_row(row, false) }
process_row(last_row, true)
end
displays
Do something with row 5,mike,5,5
last row: false
Do something with row 5,mike,2,7
last row: true
Do something with row 20,bob,1,1
last row: false
Do something with row 20,bob,15,16
last row: true
Do something with row 1,jane,4,4
last row: true
Note that
enum = CSV.foreach(CSVFile, headers:true, converters: :numeric).
chunk { |row| row['ID'] }.
each
#=> #<Enumerator: #<Enumerator::Generator:0x00007ffd1a831070>:each>
Each element generated by this enumerator is passed to the block and the block variables are assigned values by a process called array decomposition:
_,(*arr,last_row) = enum.next
#=> [5, [#<CSV::Row "ID":5 "Name":"mike" "Transaction Value":5 "Running Total ":5>,
# #<CSV::Row "ID":5 "Name":"mike" "Transaction Value":2 "Running Total ":7>]]
resulting in the following:
_ #=> 5
arr
#=> [#<CSV::Row "ID":5 "Name":"mike" "Transaction Value":5 "Running Total ":5>]
last_row
#=> #<CSV::Row "ID":5 "Name":"mike" "Transaction Value":2 "Running Total ":7>
See Enumerator#next.
I have followed the convention of using an underscore for block variables that are used in the block calculation (to alert readers of your code). Note that an underscore is a valid block variable.1
Use Enumerable#slice_when in place of chunk
CSV.foreach(CSVFile, headers:true, converters: :numeric).
slice_when { |row1,row2| row1['ID'] != row2['ID'] }.
each do |*arr, last_row|
arr.each { |row| process_row(row, false) }
process_row(last_row, true)
end
This displays the same information that is produced when chunk
is used.
Use Kernel#loop to step through the enumerator CSV.foreach(CSVFile, headers:true)
enum = CSV.foreach(CSVFile, headers:true, converters: :numeric)
row = nil
loop do
row = enum.next
next_row = enum.peek
process_row(row, row['ID'] != next_row['ID'])
end
process_row(row, true)
This displays the same information that is produced when chunk
is used. See Enumerator#next and Enumerator#peek.
After enum.next
returns the last CSV::Row
object enum.peek
will generate a StopIteration
exception. As explained in its doc, loop
handles that exception by breaking out of the loop. row
must be initialized to an arbitrary value before entering the loop so that row
is visible after the loop terminates. At that time row
will contain the CSV::Row
object for the last line of the file.
1 IRB uses the underscore for its own purposes, resulting in the block variable _
being assigned an erroneous value when the code above is run.
Related Topics
Ruby on Rails: Alias_Method_Chain, What Exactly Does It Do
Rails Activesupport Time Parsing
Checking If a Variable Is an Integer
Return First Match of Ruby Regex
How to Merge Two Hashes Without Overwritten Duplicate Keys in Ruby
Create Module Variables in Ruby
How to Pass Multiple Arguments to a Ruby Method as an Array
What Does 'If _File_ == $0' Mean in Ruby
Difference Between Truncation, Transaction and Deletion Database Strategies
Ruby Craziness: Class VS Object
How to Install Ruby 2 on Ubuntu Without Rvm
Ruby and "You Must Recompile Ruby with Openssl Support or Change the Sources in Your Gemfile"
How to Build a Rubygems Mirror Server
"Whenever" Gem Running Cron Jobs on Heroku
Where to Define Custom Error Types in Ruby And/Or Rails
How to Deal with the Conflict Between Activesupport::JSON and the JSON Gem