Rails 3 - How to Handle Pg Error Incomplete Multibyte Character

Rails 3 - How to handle PG Error incomplete multibyte character

This:

"Joaqu\xEDn"

is the ISO-8859-1 encoded version of "Joaquín" so it is not valid UTF-8 and your databases are right to complain about it. If possible, fix your mobile clients to use UTF-8 in the JSON; if you can't do that then you can fix the encoding with this:

params[:mobile_user][:name].force_encoding('iso-8859-1').encode!('utf-8')

on the server. The problem with fixing it on the server is that you have to guess what the incoming encoding is and your guess might not be correct. There is no way to reliably guess the encoding for a particular string, there is rchardet but it doesn't work with recent versions of Ruby and it appears to have been abandoned; you might be able to fix this gem to work with modern Ruby. There are a few other guessing libraries but they all seem to be have been abandoned as well.

JSON text is always, by definition, Unicode and UTF-8 encoded by default:

3.  Encoding

JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.

Any clients that are sending you JSON that isn't in UTF-8 is IMO broken because almost everything will assume that JSON will be UTF-8. Of course, there might be an encoding header somewhere that specifies ISO 8859-1 or maybe the headers say UTF-8 even though it is ISO 8859-1.

PGError: incomplete multibyte character error in Rails test environment

The error did turn out to be with the YML fixtures. I had been using the YAML DB gem to export some production data to be used as fixtures and it apparently doesn't output the files as UTF-8 .

I ended up manually going through and removing "extra" spaces at the ends of the YML field declarations and everything started working.

Invalid multibyte char (US-ASCII) error for ä, ü, ö, ß which are Ascii!

Put the magic comment # coding: utf-8 at the beginning your your script (on the second line if you're using shebang).

#!/usr/local/bin/ruby
# coding: utf-8

puts "i like my chars: ä, ü, ö and ß!"

android, UTF8 - How do I ensure UTF8 is used for a shared preference

The key is to understand the difference between UTF-8 and Unicode.

  • Java processes characters and strings in memory using Unicode. Each character is stored in two bytes.
  • When text is transmitted between processes (eg to a web server) or it is written to/read from disk, the internal representation is converted into an over-the-wire format. This is the encoding or decoding. UTF-8 is the most popular, but other formats include:

    • UTF-16
    • ISO 8859-1

In your question, you mention that the XML files are encoded in utf-8: That is good, and you will be able to put foreign characters in the files, but that specifies the encoding only for that specific XML file.

These XML files will be compiled into Android resources and will contain the correct values (you can check it if you like in the debugger, or by preserving the intermediate Java resource files from the build chain).

The problem is almost certainly where you send data to and receive data from the HTTP server, specifically where that data is converted between the bytes on the network and a Java String. Currently you are not setting it in the request - this can be done as described in the documentation for Apache HTTPClient.

Although the server might already require/assume this, it's certainly a good thing to state clearly in the request.

You also need to ensure that the server (the one in Rails 3 - How to handle PG Error incomplete multibyte character):

  • Is expecting UTF-8
  • Decodes the request using a UTF-8 decoder
  • Encodes the response using UTF-8 encoding

(Sorry, but I don't know Ruby on Rails so I don't know how to specifically help there).

Back in the Android end, you also need to ensure that your HTTP library is decoding the response with the UTF-8 decoder. If you handle this yourself, ensure that the String constructor you use is this one, and the argument is "utf-8":

  • public String (byte[] data, String charsetName)

Once BOTH the client and the server are using UTF-8, your problems will be resolved.

To help debugging here, I suggest:

  1. A number of logging statements on server and client that print the relevant strings as close as possible to the HTTP code
  2. Running with the client configured to talk through a debugging proxy. Examine the request and response and check that they are indeed UTF-8. Proxies include:

    • Charles
    • WebScarab
    • Fiddler


Related Topics



Leave a reply



Submit