import from CSV into Ruby array, with 1st field as hash key, then lookup a field's value given header row
To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:
$stock="XTEX"
csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" stocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}
# Now the 1-row CSV object is ready for use, eg:
$company = csv_data[:company][0]
$volatility_month = csv_data[:volatility_month][0].to_f
$sector = csv_data[:sector][0]
$industry = csv_data[:industry][0]
$rsi14d = csv_data[:relative_strength_index_14][0].to_f
which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline sed
instructions take care of that--and the whole thing is noticably instant. This this is better than last because now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case with awk
.
Parse tab delimited CSV file to array of hashes in Ruby 2.0
It seems that the options you pass to parse are listed in ::new
>> CSV.parse("qwe\tq\twe", col_sep: "\t"){|a| p a}
["qwe", "q", "we"]
Selecting a single value field from CSV file
Yes, hashes are the way to go:
require 'csv'
data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'
CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]
Pretend that
CSV.parse(data, headers: :first_row)
is really
CSV.foreach('some/file.csv', headers: :first_row)
and the data is really in a file.
The reason you want to use headers: :first_row
is that tells CSV to gobble up the first line. Then it'll return a hash for each record, using the associated header field for the keys, making it easier to retrieve specific fields by name.
From the documentation:
:headers
If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers.
Alternate ways of doing this are:
spent = CSV.parse(data).map{ |row| row[2] }
spent.shift
spent
# => ["2548", "5054", "4352", "6542"]
spent.shift
drops the first element from the array, which was the header field for that column, leaving the array containing only values.
Or:
spent = []
skip_headers = true
CSV.parse(data).each do |row|
if skip_headers
skip_headers = false
next
end
spent << row[2]
end
spent
# => ["2548", "5054", "4352", "6542"]
Similar to the shift
statement above, the next
is telling Ruby to skip to the next iteration of the loop and not process the rest of the instructions in the block, which results in the header record being skipped in the final output.
Once you have the values from the fields you want you can selectively extract specific ones. If you want the values "2548" and "4352", you have to have a way of determining which rows those are in. Using arrays (the non-header method) makes it more awkward to do, so I'd do it using hashes again:
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case row['Name']
when 'Dan', 'Carlos'
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
Notice that it's very clear what is going on which is important in code. Using the case
and when
allow me to easily add additional names to include. That acts like a chained "or" conditional test on an if
statement, but without the additional noise.
each_with_object
is similar to inject
, except it is cleaner when we need to aggregate values into an Array, Hash or some object.
Summing the array is easy and there are many different ways to get there, but I'd use:
spent.map(&:to_i).inject(:+) # => 6900
Basically that converts the individual elements to integers and adds them together. (There's more to it but that's not important until farther up your learning curve.)
I am just wondering if it is possible to replace the contents of the 'when' condition with an array of strings to iterate over rather than hard coded strings?
Here's a solution using an Array:
NAMES = %w[Dan Carlos]
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case row['Name']
when *NAMES
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
If the list of names is large I think this solution will run slower than necessary. Arrays are great for storing data you're going to get to, as a queue, or for remembering their order like a stack, but they're bad when you have to walk it just to find something. Even a sorted Array and using a binary search is likely to be slower than using a Hash because of the extra steps involved in using them. Here's an alternate way of doing this, but using a Hash:
NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case
when NAMES[row['Name']]
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
But that can be refactored to be more readable:
NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
ary << row['Total $ spent'] if NAMES[row['Name']]
end
spent
# => ["2548", "4352"]
How to read CSV data into a hash
Let's first create the CSV file.
str =<<~_
date,name,st,code,num
2020-03-25,AB,53,2585,130
2020-03-26,AB,53,3208,151
2020-03-26,BA,35,136,1
2020-03-27,BA,35,191,1
_
FName = 't'
File.write(FName, str)
#=> 120
Now we can simply read the file line-by-line, using CSV::foreach, which, without a block, returns an enumerator, and build the hash as we go along.
require 'csv'
CSV.foreach(FName, headers: true).
with_object(Hash.new { |h,k| h[k] = [] }) do |row,h|
h[row['name'].to_sym] << [row['date'], row['code']]
end
#=> {:AB=>[["2020-03-25", "2585"], ["2020-03-26", "3208"]],
# :BA=>[["2020-03-26", "136"], ["2020-03-27", "191"]]}
I've used the method Hash::new with a block to create a hash h
such that if h
does not have a key k
, h[k]
causes h[k] #=> []
. That way, h[k] << 123
, when h
has no key k
results in h[k] #=> [123]
.
Alternatively, one could write:
CSV.foreach(FName, headers: true).with_object({}) do |row,h|
(h[row['name'].to_sym] ||= []) << [row['date'], row['code']]
end
One could also use a converter to convert the values of name
to symbols, but some might see that as over-kill here:
CSV.foreach(FName, headers: true,
converters: [->(v) { v.match?(/\p{Alpha}+/) ? v.to_sym : v }] ).
with_object(Hash.new { |h,k| h[k] = [] }) do |row,h|
h[row['name']] << [row['date'], row['code']]
end
How to parse a Hash of Hashes from a CSV file
I would only store rows in the data
hash that are within the range. IMO that performs betters, because it needs less memory than reading all data into data
and remove the unwanted entries in a second step.
DATE_RANGE = (1403321503..1406082945)
CSV.foreach("sample_data.csv",
:headers => true,
:header_converters => :symbol,
:converters => :all) do |row|
attrs = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
data[row.fields[0]] = attrs if DATE_RANGE.cover?(attrs[:created_at])
end
It might make sense to check the condition before actually creating the hash by checking DATE_RANGE.cover?
against the column number (is created_at
in row.fields[1]
?).
Parse CSV into multiple lines where each value is printed after its header
require 'csv'
lineN = 0
CSV.read( filename ).each do |arr|
if lineN == 0
headers = arr
else
puts "line #{lineN}"
headers.zip(arr).each do |a|
puts "#{a.first} : #{a.last}"
end
end
lineN += 1
end
creates:
line 1
key1 : a
key2 : b
key3 : c
line 2
key1 : d
key2 :
key3 : f
Parse CSV file with header fields as attributes for each row
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Related Topics
Find Records with Datetime That Match Today's Date - Ruby on Rails
Better Way to Turn a Ruby Class into a Module Than Using Refinements
Sorting: Sort Array Based on Multiple Conditions in Ruby
Cleanest Way to Create a Hash from an Array
How to Use "Gets" on a Rake Task
Rails - Testing JSON API with Functional Tests
What Does the % Operator Do in Ruby in N % 2
Ruby: Put Request with JSON Body
What Is the Send() Method Used For
Sorting a Two-Dimensional Array by Second Value
How to Introspect Things in Ruby
Ruby Method with Maximum Number of Parameters
How to Delete All Contents of a Folder with Ruby-Rails
Errno::Enoent: No Such File or Directory Ruby