Parsing Xls and Xlsx (Ms Excel) Files with Ruby

Parsing XLS and XLSX (MS Excel) files with Ruby?

Just found roo, that might do the job - works for my requirements, reading a basic spreadsheet.

Is there any Ruby gem to read both .xls and .xlsx files?

I've had success using roo with the roo-xls extension.

Fastest way to read the first row of big XLSX file in Ruby

The ruby gem roo does not support file streaming; it reads the whole file into memory. Which, as you say, works fine for smaller files but not so well for reading small sections of huge files.

You need to use a different library/approach. For example, you can use the gem: creek, which describes itself as:

a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.

And, taking the example from the project's README, it's pretty straightforward to translate the code you wrote for roo into code that uses creek:

require 'creek'
creek = Creek::Book.new(file_path)
sheet = creek.sheets[0]
header = sheet.rows[0]

Note: A quick google of your StackOverflow question title led me to this blog post as the top search result. It's always worth searching on Google first.

Ruby Roo Gem - read Excel xlsx sheet into Hash

Work on sheet

# Open the workbook
wb = Roo::Spreadsheet.open '/Users/ankur/Desktop/wb.xlsx'
# Get first sheet
sheet = wb.sheet(0)
# Call #parse on that
sheet.parse(Fruits: "Fruits", Qty: "Qty", Location:"Location", clean:true)
#=> [{:Fruits=>"apples", :Qty=>5, :Location=>"Kitchen"}, {:Fruits=>"pearls", :Qty=>10, :Location=>"Bag"}, {:Fruits=>"plums", :Qty=>15, :Location=>"Bagpack"}]

Single Ruby Gem that parses BOTH xlsx and xls Excel files?

I would just combine the rubyXL gem and the spreadsheet gem if you're happy with the individual results both provide.

How to parse a single excel row data in rails

I recommend to use BatchFactory gem.

It uses Roo gem under the hood.

BatchFactory can read all excel file rows as array of hashes which is very handy to work with.

require 'batch_factory'
factory = BatchFactory.from_file 'filename.xlsx', keys: [:header1, :header2]
factory.rows

This will give you

[
{ header1: 'value11', header2: 'value12' },
{ header1: 'value21', header2: 'value22' },
...
]

In your case you can do

factory = BatchFactory.from_file 'filename.xlsx', keys: [:firstname]
firstnames = factory.rows.map { |row| row[:firstname] }

This will give your an array of all values from firstname column.

UPDATE

You can even omit rows in factory.rows.map because BatchFactory implement some method_missing, i.e.

firstnames = factory.map { |row| row[:firstname] }


Related Topics



Leave a reply



Submit