when we import csv data, how eliminate invalid byte sequence in UTF-8
Ruby 1.9 CSV has new parser that works with m17n. The parser works with Encoding of IO object in the string. Following methods: ::foreach, ::open, ::read, and ::readlines
could take in optional options :encoding
which you could specify the the Encoding.
For example:
CSV.read('/path/to/file', :encoding => 'windows-1251:utf-8')
Would convert all strings to UTF-8.
Also you can use the more standard encoding name 'ISO-8859-1'
CSV.read('/..', {:headers => true, :col_sep => ';', :encoding => 'ISO-8859-1'})
Rails Import CSV Error: invalid byte sequence in UTF-8
Specify the encoding with encoding
option:
CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end
Ruby/Rails CSV parsing, invalid byte sequence in UTF-8
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
invalid byte sequence for encoding UTF8
If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".
But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy
utility has detected or guessed that you're feeding it a UTF8 file.
If you're running under some variant of Unix, you can check the encoding (more or less) with the file
utility.
$ file yourfilename
yourfilename: UTF-8 Unicode English text
(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.
If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:
$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators
If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)
You can use the iconv
utility to change encoding of the input data.
iconv -f original_charset -t utf-8 originalfile > newfile
You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".
Invalid byte sequence importing CSV created with R to Postgres
From my comment:
write.csv(df, out_file,fileEncoding=TRUE)
# write.csv(df, con)
Either of the above will work. If the encoding option is added to the connection, I don't think it doesn't affect the file itself.
PostgreSQL invalid byte sequence for encoding utf8 0xbf
Your COPY
statement is correct, but your data are not in UTF8 encoding.
They are probably in Latin-1 or Windows-1252, where 0xBF
is ¿
.
Specify the encoding correctly, e.g.:
COPY edmonton.general_filtered (descriptive)
FROM 'D:/property_own/descriptive_details.csv'
(FORMAT 'csv', HEADER, ENCODING 'WIN1252');
invalid byte sequence for encoding “UTF8”
Posted in another thread - use the iconv
command to strip these characters out of your file. Greenplum is instantiated using a character set, UTF-8
by default, and requires that all characters be of the designated character set. You can also choose to log these errors with the LOG ERRORS clause of the EXTERNAL TABLE. This will trap the bad data and allow you to continue up to set LIMIT that you specify during create.
iconv -f utf-8 -t utf-8 -c file.txt
will clean up your UTF-8 file, skipping all the invalid characters.
-f is the source format
-t the target format
-c skips any invalid sequence
Related Topics
Export Content of a SQLite3 Table in CSV
What Is the &: of &:Afunction Doing
Capistrano Asks for Password When Deploying, Despite Ssh Keys
Rails Authentication Across Apps/Servers
Puppet/Facter "Could Not Retrieve Fact Fqdn": How to Fix or Circumvent
Error Installing Gems That Use Native Extensions on Ubuntu, Ruby 1.9.2 via Rvm
Where to Put Common Code Found in Multiple Models
How to Fetch Linkedin User Data
Understanding the Fibonacci Sequence
The Command Rbenv Install Is Missing
Failed to Build Gem Native Extension When Install Redcloth-4.2.9 Install Linux
Rspec: How to Test File Operations and File Content
Ruby Loading Config (Yaml) File in Same Dir as Source