Set locale to system default UTF-8
Answering my own question: On Ubuntu the default LANG
is defined in /etc/default/locale
:
jeroen@dev:~⟫ cat /etc/default/locale
# Created by cloud-init v. 0.7.7 on Wed, 29 Jun 2016 11:02:51 +0000
LANG="en_US.UTF-8"
So in R we could do something like:
readRenviron("/etc/default/locale")
LANG <- Sys.getenv("LANG")
if(nchar(LANG))
Sys.setlocale("LC_ALL", LANG)
Apache also has a line in /etc/apache2/envvars
that can be uncommented to enable this.
I've set the system locale on Windows 10 to use the beta UTF-8 support, but RStudio does not recognize it
As it turns out, the problem was in how I was reading the data. Reading it with read.csv()
read it with the encoding set by the locale. Changing it to readr::read_csv()
made sure the file was being read with its own encoding, UTF-8.
why do I get a locale error even though it is set?
Making the "comment crowned by success" an answer:
sudo locale-gen en_US en_US.UTF-8
sudo dpkg-reconfigure locales
Best practice: Should I try to change to UTF-8 as locale or is it safe to leave it as is?
This is not a perfect answer but a good workaround: As Roland pointed out, it might be dangerous to change the locale. So leave it as is. If you have a file and you run into trouble, just search for non-UTF8 encoding as discribed here for RStudio
. What I saw, most Editors have such a feature.
Furthermore, this answer gives more insight in what you can do in case you source()
a file.
For a way to deal with locales when collations play a crucial part see here
Why is PHP not taking over system default locale settings?
In order to ensure that PHP uses the locale settings from the OS you have to call setlocale(LC_ALL, "")
at the very beginning of your code.
The manual of setlocale
under https://www.php.net/manual/en/function.setlocale.php states the following:
// If locales is the empty string "", the locale names will be set from
// the values of environment variables with the same names as the above
// categories, or from "LANG".
// On Windows, setlocale(LC_ALL, '') sets the locale names from the
// system's regional/language settings (accessible via Control Panel).
Your example then looks as follows:
abc@ced4c553207d:~/$ locale -a
C
C.UTF-8
de_CH.utf8
en_US.utf8
POSIX
abc@ced4c553207d:~/$ locale
LANG=de_CH.UTF-8
LANGUAGE=de_CH.UTF-8
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES="de_CH.UTF-8"
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=de_CH.UTF-8
abc@ced4c553207d:~/$ php -r "echo setlocale(LC_MONETARY, 0).\"\n\";"
C
abc@ced4c553207d:~/$ php -r " setlocale(LC_ALL, ''); echo setlocale(LC_MONETARY, 0).\"\n\";"
de_CH.UTF-8
abc@ced4c553207d:~/$
Related Topics
Subsetting Data.Table Using Variables with Same Name as Column
Alignment of Numbers on the Individual Bars
How to Make a List of All Dataframes That Are in My Global Environment
Split Date into Different Columns for Year, Month and Day
Count Values Separated by a Comma in a Character String
Evaluating Both Column Name and the Target Value Within 'J' Expression Within 'Data.Table'
Operator == Inconsistent in Logical Columns in Data.Table
Remove Ids That Occur X Times R
How to Generate All Possible Combinations of Vectors Without Caring for Order
How to Count the Frequency of a String for Each Row in R
Floating Point Less-Than-Equal Comparisons After Addition and Substraction
Check If Point Is in Spatial Object Which Consists of Multiple Polygons/Holes
Why True == "True" Is True in R
R + Ggplot2 => Add Labels on Facet Pie Chart
Use Stat_Summary to Annotate Plot with Number of Observations
Dynamically Creating Tabs with Plots in Shiny Without Re-Creating Existing Tabs