Parsing XLS Spreadsheet in Rails using Roo Gem
Looking at the source for Excel.new, it seems that it wants a file name, not a File object or handler. In other words, it needs string representation of the full path, including filename, to the the file you want to parse. Also, it checks the extension of the file. So if the tempfile doesn't end with ".xls" you'll need to rename the file first.
Ruby Roo Gem - read Excel xlsx sheet into Hash
Work on sheet
# Open the workbook
wb = Roo::Spreadsheet.open '/Users/ankur/Desktop/wb.xlsx'
# Get first sheet
sheet = wb.sheet(0)
# Call #parse on that
sheet.parse(Fruits: "Fruits", Qty: "Qty", Location:"Location", clean:true)
#=> [{:Fruits=>"apples", :Qty=>5, :Location=>"Kitchen"}, {:Fruits=>"pearls", :Qty=>10, :Location=>"Bag"}, {:Fruits=>"plums", :Qty=>15, :Location=>"Bagpack"}]
Using Roo with Ruby(Rails) to parse Excel
I made you a generic way to extract data out of a Roo spreadsheet based on a few header names which would be the convention to use by your uploaders.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
first_row = xlsx.first_row
headers = ['CardName', 'Item']
headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
begin
xlsx.drop(first_row).each do |row|
p [row[CardName], row[Item]]
end
rescue
# the required headers are not all present
end
I suppose the only line that needs explaining is headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
for each headername assign to it with const_set the index of it in xlsx.row(first_row) (our headerrow) where the regular expression /h/i returns an index, the #{} around h is to expand the h into its value, 'CardName' in the first case, the i at the end of /h/i means the case is to be ignored, so the constant CardName is assigned the index of the string CardName in the headerrow.
Instead of the rather clumsy begin rescue structure you could check if all required constants are present with const_get and act upon that instead of catching the error.
EDIT
instead of the p [row[CardName], row[Item]]
you could check and do anything, only keep in mind that if this is going to be part of a Rails or other website the interaction with the user is going to be tickier than your puts and get example. Eg something like
headers = ['CardName', 'Item', 'Condition', 'Collection']
...
xlsx.drop(first_row).each do |row|
if row[CardName].nil? || row[Item].nil?
# let the user know or skip
else
condition, collection = row[Condition], row[Collection]
# and do something with it
end
end
Fastest way to read the first row of big XLSX file in Ruby
The ruby gem roo
does not support file streaming; it reads the whole file into memory. Which, as you say, works fine for smaller files but not so well for reading small sections of huge files.
You need to use a different library/approach. For example, you can use the gem: creek
, which describes itself as:
a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.
And, taking the example from the project's README, it's pretty straightforward to translate the code you wrote for roo
into code that uses creek
:
require 'creek'
creek = Creek::Book.new(file_path)
sheet = creek.sheets[0]
header = sheet.rows[0]
Note: A quick google of your StackOverflow question title led me to this blog post as the top search result. It's always worth searching on Google first.
Related Topics
How to Access Current_User Object in Model
Ruby on Rails - $ Rails Server Fails Because Uglifier Gem Could Not Be Found
Rails Cancan and State MAChine - Authorizing States
Ruby: Too Many Open Files @ Rb_Sysopen
Web Page Scraping Gems/Tools Available in Ruby
What Are the Meanings of the Hash Keys When Calling Objectspace.Count_Objects
If 'Self' Is Always the Implied Receiver in Ruby, Why Doesn't 'Self.Puts' Work
Controller Method #Show Getting Called
Selenium2 Webdriver Ruby => How Click on a Hidden Link
Rails Gem to Break a Paragraph into Series of Sentences
Nokogiri Issues with Ruby on Rails
How to Include Ё in [А-Я] Regexp Char Interval
How to Get a Listing of Only Files Using Dir.Glob
Given a Url, How to Get Just the Domain
How to Define a Method in Ruby Using Splat and an Optional Hash at the Same Time
How to Make a Ruby Enumerator That Does Lazy Iteration Through Two Other Enumerators