file.encoding has no effect, LC_ALL environment variable does it
Note: So finally I think that I have nailed it down. I am not confirming that it is right. But with some code reading and tests this is what I found out and I don't have additional time to look into it. If anyone is interested they can check it out and tell if this answer is right or wrong - I would be glad :)
The reference I used was from this tarball available at OpenJDK's site:
openjdk-6-src-b25-01_may_2012.tar.gz
Java natively translates all string to platform's local encoding in this method:
jdk/src/share/native/common/jni_util.c - JNU_GetStringPlatformChars()
. System propertysun.jnu.encoding
is used to determine the platform's encoding.The value of
sun.jnu.encoding
is set atjdk/src/solaris/native/java/lang/java_props_md.c - GetJavaProperties()
usingsetlocale()
method of libc. Environment variableLC_ALL
is used to set the value ofsun.jnu.encoding
. Value given at the command prompt using-Dsun.jnu.encoding
option to Java is ignored.Call to
File.exists()
has been coded in filejdk/src/share/classes/java/io/File.java
and it returns asreturn ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);
getBooleanAttributes()
is natively coded (and I am skipping steps in code browsing through many files) injdk/src/share/native/java/io/UnixFileSystem_md.c
in function :Java_java_io_UnixFileSystem_getBooleanAttributes0()
. Here the macroWITH_FIELD_PLATFORM_STRING(env, file, ids.path, path)
converts path string to platform's encoding.So conversion to wrong encoding will actually send a wrong C string (char array) to subsequent call to
stat()
method. And it will return with result that file cannot be found.
LESSON: LC_ALL
is very important
How to better setting up JVM encoding properties to UTF-8
We can encode the source encoding and output encoding by passing runtime arguments to command as follows:
mvn -Dproject.build.sourceEncoding=UTF-8 -Dproject.reporting.outputEncoding=UTF-8 clean deploy
Or by adding line in pom.xml
:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<redis.version>1.3.5.RELEASE</redis.version>
</properties>
Chef ENV settings not working
I found that the solution that worked for me was to either in the bootstrap shell script, or as inline shell, to copy the /etc/default/lang.sh to the box prior to any recipes being run. (So should be first thing done in the Vagrant file after box definitions)
lang file:
export LANGUAGE=en_US.UTF-8
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
From here the database should get setup with the UTF-8 encoding.
Hope this helps as I have spent days searching for solutions to this, and came up with the bits and pieces from various discussions, but realized that the problem was timing of when the values are set...
Wrong File Encoding in JVM after Linux Update
thanks to icza. I googled a little for JAVA_OPTS, and found, that i should use JAVA_TOOL_OPTIONS instead.
see How do I use the JAVA_OPTS environment variable?
or _JAVA_OPTIONS:
Running java with JAVA_OPTS env variable
both are working just fine, for runtime and compiler
>export JAVA_TOOL_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1
>javac Test.java
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1
>export _JAVA_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1
>javac Test.java
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1
How to specify a char set for file name (not content) in Java?
The portable Java API does not have a concept of a file system character encoding, as that wouldn't be portable: Windows e.g. saves file names as unicode no matter the locale. On Linux, however, the LC_CTYPE
facet of your locale determines the encoding of the file system. So by exporting LC_CTYPE=en_US.utf8
or similar to the environment before you launch your Java application, your application will use that for file name handling.
Also see file.encoding has no effect, LC_ALL environment variable does it which talks about some of the internals behind this conversion.
Character encoding in R
I found the answer my self. The problem was with the transformantion from UTF-8 to the system locale (the default encoding in R) through fileEncoding
. As I use RStudio
, I just changed the default encoding to UTF-8 and removed the fileEncoding="UTF-8-BOM"
from read.csv
. Then, the entire csv file was read and RStudio displays all characters correctly.
Why doesn't Encoding.default_external respect LANG?
I figured it out. Not only does the LANG
environment variable need to be set, but the locale it species must have been generated for the OS. On a stock Linux image, the default locale may be something that is not UTF-8. In my particular case, I'm using Debian 7.7 and the default locale is "POSIX". I was able to set the default locale by installing the locales package and following the interactive prompts to generate the en_US.UTF-8 locale:
$ apt-get -y install locales
If the locales package is already installed, you can just reconfigure it instead:
$ dpkg-reconfigure locales
Now setting LANG
will change the current system locale, and Ruby's Encoding.default_external
will be set properly:
$ export LANG=en_US.UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:UTF-8>
For an example of how to automate the generation and configuration of the default locale instead of doing it interactively, take a look at this Docker image.
Related Topics
Jfilechooser and Browsing Networked MAChines
Command Working in Terminal, But "No Closing Quote" Error When Used Process.Exec
One to One Mapping of Java Thread to Linux Thread (Lwp)
Commportidentifier.Getportidentifiers with Zero Ports on Linux
Set Environment Variable in Shell Script/Access in Java Program
Do Threads Created in Java Behave Differently on Windows and Linux
Determine Linux Version from Java
Why the Operating System Says It Can't Allocate Memory to Jvm When It Has Enough Memory
Java.Lang.Unsatisfiedlinkerror in Linux
Why Does Java Rmi Keep Connecting to 127.0.1.1. When Ip Is 192.168.X.X
Random Noclassdeffound Error in Web Application
Hadoop Hdfs Showing Ls: '/Home/Hduser/Input/': No Such File or Directory Error
Java Error When Trying to Run Netlogo Headlessly on a Cluster
Shuffle Multiple Files in Same Order
How to Abort the Installation of an Rpm Package If Some Conditions Are Not Met in Specfile
Org.Apache.Commons.Net.Ftp.Parser.Parserinitializationexception: Unknown Parser Type: Linux
Setting/Changing the Ctime or "Change Time" Attribute on a File