Fetching Second Row from CSV File in Ruby

Fetching a row from one csv file to search for a matching row in a different csv file

@Peter Smith has the right idea. Something like this should work.

void Main()
{
List<Foo> records;

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false
};

using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, config))
{
var options = new TypeConverterOptions { Formats = new[] { "dd/MM/yyyy mm.ss" } };

csv.Context.TypeConverterOptionsCache.AddOptions<DateTime>(options);

csv.Context.RegisterClassMap<FooMap>();

records = csv.GetRecords<Foo>().ToList();
}

using (var reader = new StreamReader("path\\to\\descriptions.csv"))
using (var csv = new CsvReader(reader, config))
{
var descriptions = csv.GetRecords<dynamic>().ToList();

records = records.Join(
descriptions,
record => record.Id,
description => description.Field1,
(record, description) => { record.Description = description.Field2; return record; }).ToList();
}

using (var writer = new StreamWriter("path\\to\\file.csv"))
using (var csv = new CsvWriter(writer, config))
{
var options = new TypeConverterOptions { Formats = new[] { "dd/MM/yyyy mm.ss" } };

csv.Context.TypeConverterOptionsCache.AddOptions<DateTime>(options);

csv.WriteRecords(records);
}
}

public class FooMap : ClassMap<Foo>
{
public FooMap()
{
Map(x => x.Id).Index(1);
Map(x => x.Timestamp).Index(0);
}
}

public class Foo
{
[Index(1)]
public string Id { get; set; }
[Index(0)]
public DateTime Timestamp { get; set; }
[Index(2)]
public string Description { get; set; }
}

Ruby CSV - get current line/row number

Because of changes in CSV in current Rubies, we need to make some changes. See farther down in the answer for the original solution with Ruby prior to 2.6. and the use of with_index which continues to work regardless of the version.

For 2.6+ this'll work:

require 'csv'

puts RUBY_VERSION

csv_file = CSV.open('test.csv')
csv_file.each do |csv_row|
puts '%i %s' % [csv_file.lineno, csv_row]
end
csv_file.close

If I read:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.00

The code results in this output:

2.6.3
1 ["Year", "Make", "Model", "Description", "Price"]
2 ["1997", "Ford", "E350", "ac, abs, moon", "3000.00"]
3 ["1999", "Chevy", "Venture \"Extended Edition\"", "", "4900.00"]
4 ["1999", "Chevy", "Venture \"Extended Edition, Very Large\"", "", "5000.00"]
5 ["1996", "Jeep", "Grand Cherokee", "MUST SELL!\\nair, moon roof, loaded", "4799.00"]

The change is because we have to get access to the current file handle. Previously we could use the global $., which always had a possibility of failure because globals can get stomped on by other sections of called code. If we have the handle of the file being opened, then we can use lineno without that concern.


$.

Ruby prior to 2.6 would let us do this:

Ruby has a magic variable $. which is the line number of the current file being read:

require 'csv'

CSV.foreach('test.csv') do |csv|
puts $.
end

with the code above, I get:

1
2
3
4
5

$INPUT_LINE_NUMBER

$. is used all the time in Perl. In Ruby, it's recommended we use it the following way to avoid the "magical" side of it:

require 'english'

puts $INPUT_LINE_NUMBER

If it's necessary to deal with embedded line-ends in fields, it's easily handled by a minor modification. Assuming a CSV file "test.csv" which contains a line with an embedded new-line:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00

with_index

Using Enumerator's with_index(1) makes it easy to keep track of the number of times CSV yields to the block, effectively simulating using $. but honoring CSV's work when reading the extra lines necessary to deal with the line-ends:

require 'csv'

CSV.foreach('test.csv', headers: true).with_index(1) do |row, ln|
puts '%-3d %-5s %-26s %s' % [ln, *row.values_at('Make', 'Model', 'Description')]
end

Which, when run, outputs:

$ ruby test.rb
1 Ford E350 ac, abs, moon
2 Chevy Venture "Extended Edition"
3 Jeep Grand Cherokee MUST SELL!
air, moon roof, loaded
4 Chevy Venture "Extended Edition, Very Large"

With Ruby, how to import a CSV file that has a row that is missing a value in the last column, and thus ends with a comma?

Your line endings are probably not \r\n then.

You can try using :row_sep => :auto (it'll look for different EOLs).

If your CSV file is not supposed to contain fields with multiple lines I'll advice you to clean the whole thing by e.g. file.gsub(/\r\r?\n?/, "\n"), and use a simple "\n" for row_sep.

See https://stackoverflow.com/a/18969935/4640187

Ruby: How can I read a CSV file that contains two headers in Ruby?

You'll have to write your own logic. CSV is really just rows and columns, and by itself has no inherent idea of what each column or row really is, it's just raw data. Thus, CSV has no concept or awareness that it has two header rows, that's a human thing, so you'll need to build your own heuristics.

Given that your data rows look like:

"721","Air Force","09/01/12",

When you start parsing your data, if the first column represents an integer, then, if you convert it to an int and if it's > 0 than you know you're dealing with a valid "row" and not a header.

How to remove a row from a CSV with Ruby

You should be able to use CSV::Table#delete_if, but you need to use CSV::table instead of CSV::read, because the former will give you a CSV::Table object, whereas the latter results in an Array of Arrays. Be aware that this setting will also convert the headers to symbols.

table = CSV.table(@csvfile)

table.delete_if do |row|
row[:foo] == 'true'
end

File.open(@csvfile, 'w') do |f|
f.write(table.to_csv)
end

Ignore header line when parsing CSV file

I have found the solution to above question. Here is the way i have done it in ruby 1.9.X.

csv_contents = CSV.parse(File.read(file))
csv_contents.slice!(0)
csv=""
csv_contents.each do |content|
csv<<CSV.generate_line(content)
end


Related Topics



Leave a reply



Submit