How to Write Rake Task to Import Data to Rails App

How to write Rake task to import data to Rails app?

I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.

If you have a Product active record model, you can do:

p = Product.find_or_initialize_by_identifier(<id you get from file>)
p.name = <name from file>
p.size = <size from file>
etc...
p.save!

The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.

To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.

Inside data.rake, it would look something like this:

namespace :data do
  desc "import data from files to database"
  task :import => :environment do
    file = File.open(<file to import>)
    file.each do |line|
      attrs = line.split(":")
      p = Product.find_or_initialize_by_identifier(attrs[0])
      p.name = attrs[1]
      etc...
      p.save!
    end
  end
end

Than to call the rake task, use "rake data:import" from the command line.

How to write a Rake task that imports data and handles deletions?

You definitely should not delete all the records and then recreate them all from the data. This will create all sorts of problems, eg breaking any foreign key fields in other tables, which used to point to the object before it was deleted. It's like knocking a house down and rebuilding it in order to have a different coloured door. So, the "see if it's there, if it is then update it (if it's different), if it's not then create it" is the right strategy to use.

You don't say what your criteria for deletion are, but if it is "any record which isn't mentioned in the import data should be deleted" then you just need to keep track of some unique field from your input data and then delete all records whose own unique field isn't in that list.

So, your code to do the import could look something like this (copying the code from the other question: this code sets the data in a horribly clunky way but i'm not going to address that here)

namespace :data do
  desc "import data from files to database"
  task :import => :environment do
    file = File.open(<file to import>)
    identifiers = []
    file.each do |line|
      #disclaimer: this way of setting the data from attrs[0], attrs[1] etc is crappy and fragile and is not how i would do it
      attrs = line.split(":")
      identifier = attrs[0]
      identifiers << identifier
      if p = Product.find_or_initialize_by_identifier(identifier)
        p.name = attrs[1]
        etc...
        p.save!
      end
    end
    #destroy any which didn't appear in the import data
    Product.where("identifier not in (?)", identifiers).each(&:destroy)
  end
end

How can I import a CSV file via a rake task?

under your project folder in lib/task create a rake file say "import_incidents_csv.rake"

follow this
Ruby on Rails - Import Data from a CSV file

in rake file have following code

require 'csv'
namespace :import_incidents_csv do
  task :create_incidents => :environment do
    "code from the link"  
  end
end

You can call this task as "rake import_incidents_csv:create_incidents"

creating rake task for importing data from csv file

I was having the same issue while writing rake task to populate data in database.
In my case the error was same and it was nothing just running the rake task in wrong manner.

I guess you are doing the same, as per the error I can guess

You are running rake tech:temp in which task is temp and namespace is tech, which is wrong you should pass it other was as first you need to give task name then namespace.

so the right command is

rake temp:tech

It hope this will work. It is silly I know

Writing TestCase for CSV import rake task

I haven't worked with engines, but is there a way to just put the CSV importing logic into it's own class?

namespace :web_import do
  desc 'Import users from csv'

  task users: :environment do
    WebImport.new(url: 'http://blablabla.com/content/people.csv').call
  end
end

class WebImport # (or whatever name you want)
  def initialize(url) ... end

  def call
    counter, CSV parse, etc...
  end
end

That way you can bump into the Rails console to do the WebImport and you can also do a test isolating WebImport. When you do Rake tasks and Jobs (Sidekiq etc), you want to make the Rake task act as as thin a wrapper as possible around the actual meat of the code (which is in this case CSV parsing). Separate the "trigger the csv parse" code from the "actually parse the csv" code into their own classes or files.

Rails 5 - Rake task to import data from CSV file

There might be two problems here.

Encoding in your csv file. ArgumentError: invalid byte sequence in UTF-8
Undefined local variable. NameError: undefined local variable or method 'randd_fields' for main:Object
I guess you are trying to count created/imported records:
```
for_code = Randd::Field.create(anz_reference: row["anz_reference"], title: title)
counter += 1 if randd_field.persisted?
```
the record you created is for_code, however you are checking against randd_field.
This should fix it
```
counter +=1 if for_code.presisted?
```

Updated:

$ bundle exec rake import:randd_fields --trace ** Invoke import:randd_fields (first_time) ** Invoke environment (first_time) ** Execute environment ** Execute import:randd_fields nil rake aborted! NameError: undefined local variable or method `title' for main:Object

this is because title variable is not defined. I guess you want to use row[] here.

for_code = Randd::Field.create(anz_reference: row["anz_reference"], title: row['title'])

Updated 2:

You have a typo in your rake task name

Updated 3:

I think you are calling bundle exec rake import:randd_fields inside rails console.
Run it directly in terminal should fix it.

Rails Rake Task drop table and import from csv

Try by truncating the table by running a custom sql command:

namespace :csvimportproducts do

  desc "Import Products CSV Data."
  task :import_products_csv_data => :environment do

    ActiveRecord::Base.connection.execute("TRUNCATE TABLE products")

    require 'csv'
    csv_file_path = '/home/jay/workspace/db/import_tables/products.csv'
    CSV.foreach(csv_file_path) do |row|
      p = Product.create!({
          :product_id => row[0],
          :product_name => row[1],
        }
      )
    end
  end
end

Rails (rake) Data Import Concurrency

For the question,

is it possible to use the parallel gem with find_each? I cannot find anything in their documentation or examples online doing such. Is there another solution I can do to for iterating over the Customers concurrently?

I would recommend you to use find_in_batches by Activerecord. You can query for a batch of records and then iterate over each element in the batch using Parallel. For example, it can be something like

User.find_in_batches do |batch|
  Parallel.each(batch,in_processes: 8) do |user|
    ...
  end
end

How to Write Rake Task to Import Data to Rails App