Import CSV in batches of lines in Rails?
Try AR Import
Old answer
Have you tried to use AR Extensions for bulk import?
You get impressive performance improvements when you are inserting 1000's of rows to DB.
Visit their website for more details.
Ruby - Read file in batches
there's no universal way.
1) you can read file by chunks:
File.open('filename','r') do |f|
chunk = f.read(2048)
...
end
disadvantage: you can miss a substring if it'd be between chunks, i.e. you look for "SOME_TEXT", but "SOME_" is a last 5 bytes of 1st 2048-byte chunk, and "TEXT" is a 4 bytes of 2nd chunk
2) you can read file line-by-line
File.open('filename','r') do |f|
line = f.gets
...
end
disadvantage: this way it'd be 2x..5x slower than first method
How to best import and process very large csv files with rails
- to import csv faster, my suggestion is using gem smarter_csv, you can cek from their website tilo/smarter_csv
- as stated from their site: > smarter_csv is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord, and parallel processing with Resque or Sidekiq
- I use this gem and combined with resque
below is sample code to import file
n = SmarterCSV.process(params[:file].path) do |chunk|
Resque.enqueue(ImportDataMethod, chunk)
end
after it read file, passed the data record to resque and then import it in background (if you using rails 4.2 above you can combine with rails active job)
Ruby on Rails - Import Data from a CSV file
require 'csv'
csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end
Importing CSV and updating records if exists in rails 4
The answer was multifaceted and Madcow, Daiku, and Jake to an extent all had a piece of it.
The updated code in the original post now works fantastically. I had put an id column in the csv file that I was uploading thinking it needed to be there to work. It did not need to be there, in fact it needed to not be there.
{"id"=>nil, "projectid"=>"IADTST1RWKP01", "batch"=>"1", "ppiho"=>"2015-11-02", "needby"=>"2015-10-02", "quantity"=>"192", "manufacturer"=>"Delta", "model"=>"US Cords", "CAR"=>nil, "cfnum"=>nil, "prnum"=>nil, "ponum"=>nil, "status"=>"Quote Completed", "contact"=>"Mike Salafia", "notes"=>nil}
when that hash was presented to the order it rejected it because id was trying to be set to nil and it cannot be nil. However, I don't need id in the csv file to find the order and update it or create a new order as the projectid and batch can be used for find and the id is auto assigned.
So I needed a return, Daiku made me look at the hash harder, and Jake's code would also likely work.
Is there a way to write to Kiba CSV destination line by line or in batches instead of all at once?
Glad you like Kiba!
I'm going to make you happy by stating that your understanding is incorrect.
The rows are yielded & processed one by one in Kiba.
To see how things work exactly, I suggest you try it this code:
class MySource
def initialize(enumerable)
@enumerable = enumerable
end
def each
@enumerable.each do |item|
puts "Source is reading #{item}"
yield item
end
end
end
class MyDestination
def write(row)
puts "Destination is writing #{row}"
end
end
source MySource, (1..10)
destination MyDestination
Run this and you'll see that each item is read then written.
Now to your actual concrete case - what's above means that you can implement your source this way:
class ActiveRecord
def initialize(model:)
@model = model
end
def each
@model.find_each do |record|
yield record
end
end
end
then you can use it like this:
source ActiveRecordSource, model: Person.where("age > 21")
(You could also leverage find_in_batches
if you wanted each row to be an array of multiple records, but that's probably not what you need here).
Hope this properly answers your question!
Importing CSV file into multiple models at one time
I was able to get this to work, the following is what I used. Let me know if there is a smarter way to do this. NOTE: if you are trying to do this, place the csv file in the root of your rails directory and execute this script line by line in the console. At least thats how I got it working.
require 'csv'
csvfile = File.read("testimport.csv")
csv = CSV.parse(csvfile, :headers => false)
csv.each do |row|
c = Church.new
c.name = row[0]
c.url = row[10]
c.locations.build(:title => "Sanctuary", :address => row[3], :zipcode => row[5], :phone => row[6], :email => row[2], :city => row[4])
c.save
loc = c.locations.first
loc.pastors.build(:firstname => row[1])
loc.save
end
Import records from CSV in small chunks (ruby on rails)
Read the rest of the CSV in to an array and outside the CSV.foreach loop write to the same CSV file, so that it gets smaller each time. I suppose i don't have to give this in code but if necessary comment me and i'll do.
If you want to keep the CSV in a whole, add a field "pocessed" to the CSV and fill it with a 1 if read, next time filter these out.
EDIT: this isn't tested and sure could be better but just to show what i mean
require 'csv'
index = 1
csv_out = CSV::Writer.generate(File.open('new.csv', 'wb'))
CSV.foreach('reviews.csv', :headers => true) do |row|
if index < 101
Review.create(row.to_hash)
else
csv_out << row
end
index += 1
end
csv_out.close
afterward, dump reviews.csv and rename new.csv to reviews.csv
Related Topics
How to Understand the #Dup and #Clone Operate on Objects Which Referencing Other Objects
Ruby File Reading Parallelisim
Rails 4 Not Encrypting Cookie Contents
Kernel_Require.Rb:55:In 'Require': Cannot Load Such File Error
How to Get a List from a Ruby Enumerable
Why Can't I Change the Value of Self
Enforce File Upload Size on Heroku
Ruby/Rails Actionmailer Not Working with Ntlm
Gitlab: Invocation of Gitlab-Shell
Problems with Jslint-V8 Ruby Gem Installation on Windows7 64-Bit
Can't Access the Dockerized App Launched from the Command Line from Outside
Ruby: Difference Between Read_Timeout and Open_Timeout