Set Utf-8 as Default String Encoding in Heroku

Set UTF-8 as default string encoding in Heroku

As per the Heroku support staff, this is the magic thing:


heroku config:add LANG=en_US.UTF-8

Although heroku console will keep reporting strings encoding as ASCII-8BIT, your actuall app will be running with the correct encoding, based on the LANG config var.
You can double check that by doing this:


$ heroku run bash
Running bash attached to terminal... up, run.2
u20415@022e95bf-3ab6-4291-97b1-741f95e7fbda:/app$ irb
irb(main):001:0> "a".encoding
=> #<Encoding:UTF-8>

Why doesn't Heroku specify a default character encoding on their virtual machines?

Heroku support was extremely helpful in answering my question. Essentially, they just leave the encoding up to the buildpack for each language. Additionally they gave me links to how this is implemented in the buildpacks they maintain for Python and Ruby.

I found the Python script particularly helpful in setting the encoding automatically for Haskell apps (I copied parts of it for the Haskell buildpack). Here's the PR I created to set UTF-8 as a default in Haskell. You should specify the default character encoding in two places in the compile script in the buildpack API. First, it's probably a good idea to export the language before compiling the compiler itself, as they do for Python. Second (and most importantly for your application) you should set it in the .profile.d script so that it gets picked up as the context for your application.

Hope this is helpful info to folks making buildpacks for other languages!

Heroku and Rails: how to set utf-8 as the default encoding

In your config/application.rb,

  config.encoding = "utf-8"

in the database.yml,

 development:
adapter: mysql2(whatever your db)
host: localhost
encoding: utf8

you also have to add(including the hash)

# encoding: UTF-8

source:http://craiccomputing.blogspot.com/2011/02/rails-utf-8-and-heroku.html

Heroku app does not display Cyrillic characters with UTF-8 as default encoding

If you have installed the locale buildpack, remove the .locales file you created and the buildpack by going to your Heroku app's settings.

I fixed the problem by applying the following settings in my application.properties file:

spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
spring.http.encoding.force=true

Also, on a side note, the first build after these settings are applied may be extra slow - if you are building in Eclipse and are used to your Heroku app building in around 1 minute, beware that this may take over 5 minutes to build.

Default Charset on Heroku (US-ASCII) causing problems

Finally, I got in touch with the friendly staff at Heroku--they gave the following suggestion to over-ride file.encoding property via the JAVA_OPTS env-variable.

Issued the following from my Heroku Toolbelt, & things began working now.

heroku config:add JAVA_OPTS='-Xmx384m -Xss512k -XX:+UseCompressedOops -Dfile.encoding=UTF-8'

This way, the JVM picks it up, & now Charset.defaultCharset( ) returns UTF-8, with special characters appearing as they should!

They also said, we could alternatively do the following as well:

heroku config:add JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF-8'

Also, it would be a good idea to embed this property right into the Procfile of the app, so that our code behaves the same when we push it to a new Heroku app.

Heroku: GET data is not retrieved as UTF-8

The support team of Heroku could answer my problem. If you are experiencing the same problem, this is the solution.

If you are using the latest version of webapp-runner (7.0.57.2), you
can add the option --uri-encoding in your Procfile.

java ... -jar target/dependency/webapp-runner.jar --port $PORT --uri-encoding UTF-8 --expand-war target/*.war`

Ruby converting string encoding from ISO-8859-1 to UTF-8 not working

You assign a string, in UTF-8. It contains ä. UTF-8 represents ä with two bytes.

string = 'ä'
string.encoding
# => #<Encoding:UTF-8>
string.length
# 1
string.bytes
# [195, 164]

Then you force the bytes to be interpreted as if they were ISO-8859-1, without actually changing the underlying representation. This does not contain ä any more. It contains two characters, Ã and ¤.

string.force_encoding('iso-8859-1')
# => "\xC3\xA4"
string.length
# 2
string.bytes
# [195, 164]

Then you translate that into UTF-8. Since this is not reinterpretation but translation, you keep the two characters, but now encoded in UTF-8:

string = string.encode('utf-8')
# => "ä"
string.length
# 2
string.bytes
# [195, 131, 194, 164]

What you are missing is the fact that you originally don't have an ISO-8859-1 string, as you would from your Web-service - you have gibberish. Fortunately, this is all in your console tests; if you read the response of the website using the proper input encoding, it should all work okay.

For your console test, let's demonstrate that if you start with a proper ISO-8859-1 string, it all works:

string = 'Norrlandsvägen'.encode('iso-8859-1')
# => "Norrlandsv\xE4gen"
string = string.encode('utf-8')
# => "Norrlandsvägen"

EDIT For your specific problem, this should work:

require 'net/https'
uri = URI.parse("https://rusta.easycruit.com/intranet/careerbuilder_se/export/xml/full")
options = {
:use_ssl => uri.scheme == 'https',
:verify_mode => OpenSSL::SSL::VERIFY_NONE
}
response = Net::HTTP.start(uri.host, uri.port, options) do |https|
https.request(Net::HTTP::Get.new(uri.path))
end
body = response.body.force_encoding('ISO-8859-1').encode('UTF-8')


Related Topics



Leave a reply



Submit