Rails 3 - How to handle PG Error incomplete multibyte character
This:
"Joaqu\xEDn"
is the ISO-8859-1 encoded version of "Joaquín"
so it is not valid UTF-8 and your databases are right to complain about it. If possible, fix your mobile clients to use UTF-8 in the JSON; if you can't do that then you can fix the encoding with this:
params[:mobile_user][:name].force_encoding('iso-8859-1').encode!('utf-8')
on the server. The problem with fixing it on the server is that you have to guess what the incoming encoding is and your guess might not be correct. There is no way to reliably guess the encoding for a particular string, there is rchardet but it doesn't work with recent versions of Ruby and it appears to have been abandoned; you might be able to fix this gem to work with modern Ruby. There are a few other guessing libraries but they all seem to be have been abandoned as well.
JSON text is always, by definition, Unicode and UTF-8 encoded by default:
3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Any clients that are sending you JSON that isn't in UTF-8 is IMO broken because almost everything will assume that JSON will be UTF-8. Of course, there might be an encoding header somewhere that specifies ISO 8859-1 or maybe the headers say UTF-8 even though it is ISO 8859-1.
PGError: incomplete multibyte character error in Rails test environment
The error did turn out to be with the YML fixtures. I had been using the YAML DB gem to export some production data to be used as fixtures and it apparently doesn't output the files as UTF-8 .
I ended up manually going through and removing "extra" spaces at the ends of the YML field declarations and everything started working.
Invalid multibyte char (US-ASCII) error for ä, ü, ö, ß which are Ascii!
Put the magic comment # coding: utf-8
at the beginning your your script (on the second line if you're using shebang).
#!/usr/local/bin/ruby
# coding: utf-8
puts "i like my chars: ä, ü, ö and ß!"
android, UTF8 - How do I ensure UTF8 is used for a shared preference
The key is to understand the difference between UTF-8 and Unicode.
- Java processes characters and strings in memory using Unicode. Each character is stored in two bytes.
- When text is transmitted between processes (eg to a web server) or it is written to/read from disk, the internal representation is converted into an over-the-wire format. This is the encoding or decoding. UTF-8 is the most popular, but other formats include:
- UTF-16
- ISO 8859-1
In your question, you mention that the XML files are encoded in utf-8: That is good, and you will be able to put foreign characters in the files, but that specifies the encoding only for that specific XML file.
These XML files will be compiled into Android resources and will contain the correct values (you can check it if you like in the debugger, or by preserving the intermediate Java resource files from the build chain).
The problem is almost certainly where you send data to and receive data from the HTTP server, specifically where that data is converted between the bytes on the network and a Java String
. Currently you are not setting it in the request - this can be done as described in the documentation for Apache HTTPClient.
Although the server might already require/assume this, it's certainly a good thing to state clearly in the request.
You also need to ensure that the server (the one in Rails 3 - How to handle PG Error incomplete multibyte character):
- Is expecting UTF-8
- Decodes the request using a UTF-8 decoder
- Encodes the response using UTF-8 encoding
(Sorry, but I don't know Ruby on Rails so I don't know how to specifically help there).
Back in the Android end, you also need to ensure that your HTTP library is decoding the response with the UTF-8 decoder. If you handle this yourself, ensure that the String constructor you use is this one, and the argument is "utf-8":
- public String (byte[] data, String charsetName)
Once BOTH the client and the server are using UTF-8, your problems will be resolved.
To help debugging here, I suggest:
- A number of logging statements on server and client that print the relevant strings as close as possible to the HTTP code
Running with the client configured to talk through a debugging proxy. Examine the request and response and check that they are indeed UTF-8. Proxies include:
- Charles
- WebScarab
- Fiddler
Related Topics
How to Mix Required Argument and Optional Arguments in Ruby
Use [].Replace to Make a Copy of an Array
Does 'Upcase!' Not Mutate a Variable in Ruby
Return True Only If All Values Evaluate to True in Ruby
Currying a Proc with Keyword Arguments in Ruby
Official Expansion of ||= Conditional Assignment Operator
How to Set the Mechanize Page Encoding
How to Use Multiple Models for Tag_Cloud
Nokogiri Returning Values as a String, Not an Array
Any Ruby Library to Inspect What Are the Arguments That a Certain Methods Take
"Bad Ecpoint" Ssl Error on Fresh Rvm Ruby 1.9.3 Install on Osx Mountain Lion
Nokogiri Fails to Install on Os X
Rubocop, How to Disable/Enable Cops on Blocks of Code
How Does Sinatra Define and Invoke the Get Method
Jekyll Plugin Not Work on Github
Chrome Asks to "Select a Certificate" for Ssl on My Rails App Using Thin
Why Does Ruby '**' Operator Have Higher Precedence Than Unary '-'