Import CSV in Batches of Lines in Rails

Import CSV in batches of lines in Rails?

Try AR Import

Old answer

Have you tried to use AR Extensions for bulk import?
You get impressive performance improvements when you are inserting 1000's of rows to DB.
Visit their website for more details.

Ruby - Read file in batches

there's no universal way.

1) you can read file by chunks:

File.open('filename','r') do |f|
chunk = f.read(2048)
...
end

disadvantage: you can miss a substring if it'd be between chunks, i.e. you look for "SOME_TEXT", but "SOME_" is a last 5 bytes of 1st 2048-byte chunk, and "TEXT" is a 4 bytes of 2nd chunk

2) you can read file line-by-line

File.open('filename','r') do |f|
line = f.gets
...
end

disadvantage: this way it'd be 2x..5x slower than first method

How to best import and process very large csv files with rails

  • to import csv faster, my suggestion is using gem smarter_csv, you can cek from their website tilo/smarter_csv
  • as stated from their site: > smarter_csv is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord, and parallel processing with Resque or Sidekiq
  • I use this gem and combined with resque

below is sample code to import file

  n = SmarterCSV.process(params[:file].path) do |chunk|
Resque.enqueue(ImportDataMethod, chunk)
end

after it read file, passed the data record to resque and then import it in background (if you using rails 4.2 above you can combine with rails active job)

Ruby on Rails - Import Data from a CSV file

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end

Importing CSV and updating records if exists in rails 4

The answer was multifaceted and Madcow, Daiku, and Jake to an extent all had a piece of it.

The updated code in the original post now works fantastically. I had put an id column in the csv file that I was uploading thinking it needed to be there to work. It did not need to be there, in fact it needed to not be there.

{"id"=>nil, "projectid"=>"IADTST1RWKP01", "batch"=>"1", "ppiho"=>"2015-11-02", "needby"=>"2015-10-02", "quantity"=>"192", "manufacturer"=>"Delta", "model"=>"US Cords", "CAR"=>nil, "cfnum"=>nil, "prnum"=>nil, "ponum"=>nil, "status"=>"Quote Completed", "contact"=>"Mike Salafia", "notes"=>nil}

when that hash was presented to the order it rejected it because id was trying to be set to nil and it cannot be nil. However, I don't need id in the csv file to find the order and update it or create a new order as the projectid and batch can be used for find and the id is auto assigned.

So I needed a return, Daiku made me look at the hash harder, and Jake's code would also likely work.

Is there a way to write to Kiba CSV destination line by line or in batches instead of all at once?

Glad you like Kiba!

I'm going to make you happy by stating that your understanding is incorrect.

The rows are yielded & processed one by one in Kiba.

To see how things work exactly, I suggest you try it this code:

class MySource
def initialize(enumerable)
@enumerable = enumerable
end

def each
@enumerable.each do |item|
puts "Source is reading #{item}"
yield item
end
end
end

class MyDestination
def write(row)
puts "Destination is writing #{row}"
end
end

source MySource, (1..10)
destination MyDestination

Run this and you'll see that each item is read then written.

Now to your actual concrete case - what's above means that you can implement your source this way:

class ActiveRecord
def initialize(model:)
@model = model
end

def each
@model.find_each do |record|
yield record
end
end
end

then you can use it like this:

source ActiveRecordSource, model: Person.where("age > 21")

(You could also leverage find_in_batches if you wanted each row to be an array of multiple records, but that's probably not what you need here).

Hope this properly answers your question!

Importing CSV file into multiple models at one time

I was able to get this to work, the following is what I used. Let me know if there is a smarter way to do this. NOTE: if you are trying to do this, place the csv file in the root of your rails directory and execute this script line by line in the console. At least thats how I got it working.

require 'csv'  
csvfile = File.read("testimport.csv")
csv = CSV.parse(csvfile, :headers => false)
csv.each do |row|
c = Church.new
c.name = row[0]
c.url = row[10]
c.locations.build(:title => "Sanctuary", :address => row[3], :zipcode => row[5], :phone => row[6], :email => row[2], :city => row[4])
c.save
loc = c.locations.first
loc.pastors.build(:firstname => row[1])
loc.save

end

Import records from CSV in small chunks (ruby on rails)

Read the rest of the CSV in to an array and outside the CSV.foreach loop write to the same CSV file, so that it gets smaller each time. I suppose i don't have to give this in code but if necessary comment me and i'll do.

If you want to keep the CSV in a whole, add a field "pocessed" to the CSV and fill it with a 1 if read, next time filter these out.

EDIT: this isn't tested and sure could be better but just to show what i mean

require 'csv'
index = 1
csv_out = CSV::Writer.generate(File.open('new.csv', 'wb'))
CSV.foreach('reviews.csv', :headers => true) do |row|
if index < 101
Review.create(row.to_hash)
else
csv_out << row
end
index += 1
end
csv_out.close

afterward, dump reviews.csv and rename new.csv to reviews.csv



Related Topics



Leave a reply



Submit