Rails 3, check CSV file encoding before import
You can use Charlock Holmes, a character encoding detecting library for Ruby.
https://github.com/brianmario/charlock_holmes
To use it, you just read the file, and use the detect
method.
contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
You can also convert the encoding to UTF-8 if it is not in the correct format:
utf8_encoded_content = CharlockHolmes::Converter.convert contents, detection[:encoding], 'UTF-8'
This saves users from having to do it themselves before uploading it again.
Before Action on Import from CSV
You don't need a before action.
You need a pre-prossessor, well actually you need to pre-prossess yourself.
Your CSV comes with columns. Column 0, 1, 2, 3 etc (since you don't use headers).
So, for your text columns, let's call them for the sake of the example columns 1, 3, 5.
def self.import(file)
text_cols=[1,3,5] #for example
SmarterCSV.process(file.path) do |row|
text_cols.each do |column|
row[column]=CGI::unescape(row[column]).force_encoding('UTF-8')
end
Player.create(row)
end
end
Or simply, for your particular case:
def self.import(file)
SmarterCSV.process(file.path) do |row|
row.first=CGI::unescape(row.first).force_encoding('UTF-8')
Player.create(row.first)
end
end
Ruby/Rails CSV parsing, invalid byte sequence in UTF-8
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
CSV importing in Rails - invalid byte sequence in UTF-8 with non-english characters
Solved it with a different approach, this is a much easier solution for importing CSV files into a Rails 3 model than using an external gem:
require 'csv'
CSV.foreach('doc/socios_full.csv') do |row|
record = Associate.new(
:media_format => row[0],
:group => row[0],
:member => row[1],
:family_relationship_code => row[2],
:family_relationship_description => row[3],
:last_name => row[4],
:names => row[5],
...
)
record.save!
end
It works flawlessly, even with non-english characters (just tried a 75k import file!). Hope it's helpful for someone.
How to read data from a CSV file of two possible encodings?
Once you know what encoding your file has, you can pass inside the CSV options i.e.
external_encoding: Encoding::ISO_8859_15,
internal_encoding: Encoding::UTF_8
(This would establish, that the file is ISO-8859-15, but you want the strings internally as UTF-8).
So the strategy is that you decided first (before opening the file), what encoding you want, and then use the appropriate option Hash.
Related Topics
Ruby Class Instance Variables and Inheritance
Rails - Displaying Foreign Key References in a Form
Ruby CSV Parsing String with Escaped Quotes
Model Using Modules in Rails Application
Open and Save Base64 Encoded Image Data Uri in Ruby
How to Start the Ruby Debugger on Exception
Cucumber + Webrat + Selenium Guide
Parallel Http Requests in Ruby
How to View a Sample of the Call Stack in Ruby
Rspec Stubbing Method for Only Specific Arguments
Check If String Contains Any Substring in an Array in Ruby