Using Ruby CSV to Extract One Column

Using Ruby CSV to extract one column

To pluck a column out of a csv I'd probably do something like the following:

col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}

That should be substantially faster than any operations on CSV.Table

Selecting a single value field from CSV file

Yes, hashes are the way to go:

require 'csv'

data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'

CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]

Pretend that

CSV.parse(data, headers: :first_row)

is really

CSV.foreach('some/file.csv', headers: :first_row)

and the data is really in a file.

The reason you want to use headers: :first_row is that tells CSV to gobble up the first line. Then it'll return a hash for each record, using the associated header field for the keys, making it easier to retrieve specific fields by name.

From the documentation:

:headers

If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers.

Alternate ways of doing this are:

spent = CSV.parse(data).map{ |row| row[2] }
spent.shift

spent
# => ["2548", "5054", "4352", "6542"]

spent.shift drops the first element from the array, which was the header field for that column, leaving the array containing only values.

Or:

spent = []
skip_headers = true
CSV.parse(data).each do |row|

if skip_headers
skip_headers = false
next
end

spent << row[2]
end

spent
# => ["2548", "5054", "4352", "6542"]

Similar to the shift statement above, the next is telling Ruby to skip to the next iteration of the loop and not process the rest of the instructions in the block, which results in the header record being skipped in the final output.

Once you have the values from the fields you want you can selectively extract specific ones. If you want the values "2548" and "4352", you have to have a way of determining which rows those are in. Using arrays (the non-header method) makes it more awkward to do, so I'd do it using hashes again:

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
case row['Name']
when 'Dan', 'Carlos'
ary << row['Total $ spent']
end
end

spent
# => ["2548", "4352"]

Notice that it's very clear what is going on which is important in code. Using the case and when allow me to easily add additional names to include. That acts like a chained "or" conditional test on an if statement, but without the additional noise.

each_with_object is similar to inject, except it is cleaner when we need to aggregate values into an Array, Hash or some object.

Summing the array is easy and there are many different ways to get there, but I'd use:

spent.map(&:to_i).inject(:+) # => 6900

Basically that converts the individual elements to integers and adds them together. (There's more to it but that's not important until farther up your learning curve.)


I am just wondering if it is possible to replace the contents of the 'when' condition with an array of strings to iterate over rather than hard coded strings?

Here's a solution using an Array:

NAMES = %w[Dan Carlos]

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case row['Name']
when *NAMES
ary << row['Total $ spent']
end
end

spent
# => ["2548", "4352"]

If the list of names is large I think this solution will run slower than necessary. Arrays are great for storing data you're going to get to, as a queue, or for remembering their order like a stack, but they're bad when you have to walk it just to find something. Even a sorted Array and using a binary search is likely to be slower than using a Hash because of the extra steps involved in using them. Here's an alternate way of doing this, but using a Hash:

NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case
when NAMES[row['Name']]
ary << row['Total $ spent']
end
end

spent
# => ["2548", "4352"]

But that can be refactored to be more readable:

NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
ary << row['Total $ spent'] if NAMES[row['Name']]
end

spent
# => ["2548", "4352"]

In Ruby, how to read data column wise from a CSV file?

This is the solution guys:

CSV.foreach(filename).map { |row| row[0] }

Sorry for posting it in the correct format so late.

How to check for a column data and extract data on the same row? CSV on Ruby

You can access to Security and Level, and any other value you define in your header as object['header'].

In this case the object is each row for the iteration, and the values are Severity and Level, so:

require 'csv'

file = '/path_to_data/data.csv'
CSV.foreach(file, headers: true, col_sep: ';') do |row|
p [row['Severity'], row['Level']]
end

# ["Least", "1"]
# ["Average", "2"]
# ["Normal", "3"]
# ["High", "4"]
# ["Severe", "5"]

Ruby - comparing adjacent entries in one column of a csv file

Here's one way to do it

current_identifier = nil

(firstrow.to_i..lastrow.to_i).each do |row|

if current_identifer != data.at(row).dln # current row is new identifier
if current_identifier # this is not the first row
puts "Last row for #{current_identifier} is #{row-1}\n"
end
current_identifier = data.at(row).dln # remember current row
end

# we need to track the last row as the last for the current identifier
puts "Last row for #{current_identifier} is #{lastrow.to_i}\n"

CSV.foreach Not Reading First Column in CSV File

I had a similar problem, though running your example worked.
I realized that problem (at least for me) was that I was creating CSV file using "Save As UTF-8 CSV" from Excel.

This adds BOM to the beginning of the file - before the first column header name and consequently row['firstColumnName'] was returning nil.

Saving file as CSV fixed the issue for me.

Pulling a value from one CSV based on a value in another

I'm using 1.9, where FasterCSV is available as CSV in the standard lib. First I'd create a lookup hash out of lookup.csv:

cities = Hash[CSV.read('lookup.csv', :col_sep => ' | ').to_a[1..-1]]

If the file is very big, you might want to iterate over it with CSV.foreach and build the hash row by row:

cities = {}
CSV.foreach('lookup.csv', :col_sep => ' | ', :headers => true, :return_headers => false) do |line|
cities[line['City']] = line['City ID']
end

Then iterate over master.csv, do a lookup of the city in the hash and write that to output.csv:

CSV.open('output.csv', "w", :headers => ['First Name', 'Last Name', 'City ID'], :write_headers => true) do |output|
CSV.foreach('master.csv', :col_sep => ' | ', :headers => true, :return_headers => false) do |line|
output << [line['First Name'], line['Last Name'], cities[line['City']]]
end
end

How to read specific columns of a zipped CSV file

Though your example shows ZipFile, you're really asking a CSV question. First, you should check the docs in http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html

You'll find that if you parse your data with the :headers => true option, you'll get a CSV::table object that knows how to extract a column of data as follows. (For obvious reasons, I wouldn't code it this way -- this is for example only.)

require 'zip'
require 'csv'

csv_table = nil
Zip::ZipFile.foreach("x.csv.zip") do |entry|
istream = entry.get_input_stream
data = istream.read
csv_table = CSV.parse(data, :col_sep => " ", :headers => true)
end

With the data you gave, we need `col_sep => " " since you're using spaces as column separators. But now we can do:

>> csv_table["NAME"]   # extract the NAME column
=> ["NAME1", "NAME2"]

How to extract one column of a csv file

You could use awk for this. Change '$2' to the nth column you want.

awk -F "\"*,\"*" '{print $2}' textfile.csv


Related Topics



Leave a reply



Submit