Parsing Xls Spreadsheet in Rails Using Roo Gem

Parsing XLS Spreadsheet in Rails using Roo Gem

Looking at the source for Excel.new, it seems that it wants a file name, not a File object or handler. In other words, it needs string representation of the full path, including filename, to the the file you want to parse. Also, it checks the extension of the file. So if the tempfile doesn't end with ".xls" you'll need to rename the file first.

Ruby Roo Gem - read Excel xlsx sheet into Hash

Work on sheet

# Open the workbook
wb = Roo::Spreadsheet.open '/Users/ankur/Desktop/wb.xlsx'
# Get first sheet
sheet = wb.sheet(0)
# Call #parse on that
sheet.parse(Fruits: "Fruits", Qty: "Qty", Location:"Location", clean:true)
#=> [{:Fruits=>"apples", :Qty=>5, :Location=>"Kitchen"}, {:Fruits=>"pearls", :Qty=>10, :Location=>"Bag"}, {:Fruits=>"plums", :Qty=>15, :Location=>"Bagpack"}]

Using Roo with Ruby(Rails) to parse Excel

I made you a generic way to extract data out of a Roo spreadsheet based on a few header names which would be the convention to use by your uploaders.

require 'roo'
require 'roo-xls'

xlsx = Roo::Spreadsheet.open('Demo.xls')
first_row = xlsx.first_row
headers = ['CardName', 'Item']
headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
begin
xlsx.drop(first_row).each do |row|
p [row[CardName], row[Item]]
end
rescue
# the required headers are not all present
end

I suppose the only line that needs explaining is headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}

for each headername assign to it with const_set the index of it in xlsx.row(first_row) (our headerrow) where the regular expression /h/i returns an index, the #{} around h is to expand the h into its value, 'CardName' in the first case, the i at the end of /h/i means the case is to be ignored, so the constant CardName is assigned the index of the string CardName in the headerrow.
Instead of the rather clumsy begin rescue structure you could check if all required constants are present with const_get and act upon that instead of catching the error.

EDIT

instead of the p [row[CardName], row[Item]] you could check and do anything, only keep in mind that if this is going to be part of a Rails or other website the interaction with the user is going to be tickier than your puts and get example. Eg something like

headers = ['CardName', 'Item', 'Condition', 'Collection']
...
xlsx.drop(first_row).each do |row|
if row[CardName].nil? || row[Item].nil?
# let the user know or skip
else
condition, collection = row[Condition], row[Collection]
# and do something with it
end
end

Fastest way to read the first row of big XLSX file in Ruby

The ruby gem roo does not support file streaming; it reads the whole file into memory. Which, as you say, works fine for smaller files but not so well for reading small sections of huge files.

You need to use a different library/approach. For example, you can use the gem: creek, which describes itself as:

a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.

And, taking the example from the project's README, it's pretty straightforward to translate the code you wrote for roo into code that uses creek:

require 'creek'
creek = Creek::Book.new(file_path)
sheet = creek.sheets[0]
header = sheet.rows[0]

Note: A quick google of your StackOverflow question title led me to this blog post as the top search result. It's always worth searching on Google first.



Related Topics



Leave a reply



Submit