Cyrillic encoding output in R
I know your pain about encoding troubles:(
Hope this will help you:
> Sys.setlocale(,"ru_RU")
[1] "ru_RU/ru_RU/ru_RU/C/ru_RU/C"
> test = c("привет","пока")
> write(test, file="test.txt")
You can even use cyrillic variables after that Sys.setlocale(,"ru_RU")
:
> привет <- rnorm(100)
> min(привет)
[1] -2.54578
Так что удачи! :)
Working with Cyrillic in R
Seems that the problem is behind your R(Studio) locale (see the reproduction code below). I would 1) use readxl
for reading XLSX files, 2) not mess up with locales (I had the same problem with reading CSV files some time before, and instead of just setting encoding = "UTF-8"
changed locale -- and it ruined the RStudio output completely -- only update of RStudio helped). So I would try to restart or reinstall RStudio (especially if you can update it at the same time :).
f <- "C:/Users/Alexey/Downloads/Kyiv DFRR.xlsx"
df <- readxl::read_excel(f)
Sys.setlocale("LC_CTYPE", "ukrainian")
head(df)
# A tibble: 6 x 10
Object
<chr>
1 "друга нитка Головного міського каналізаційного колектора \r\n"
2 "об'єкт по вул. Воровського, 2, - реставрація з пристосуванням під розміщення Державного спеціалізованого мистецького навчального
3 велика окружна дорога на ділянці від просп. Маршала Рокоссовського до вул. Богатирської з будівництвом транспортної розв'язки на
4 "будівля бюджетної сфери - школа-дитячий садок N 173 \"Райдуга\" по вул. Блюхера, 3а"
5 будівля бюджетної сфери - дошкільний навчальний заклад N 300 по вул. Радунській, 22/9а
6 стадіон із штучним покриттям по вул. Драйзера, 2б, у Деснянському районі
# ... with 9 more variables: Type <chr>, Planned <dbl>, `Planned ( 9 months)` <dbl>, Paid <dbl>, `% paid/ planned (9
# months)` <dbl>, latitude <dbl>, longitude <dbl>, `Kyiv city district` <chr>, MP <chr>
Decoding Cyrillic string in R
These steps seem to do the trick
word <- "обезпечен"
xx <- iconv(word, from="UTF-8", to="cp1251")
Encoding(xx) <- "UTF-8"
xx
# [1] "обезпечен"
target <- "обезпечен"
xx == target
# [1] TRUE
So it seems what happened was at one point the bytes that make up the UTF-8 target
value were misinterpreted as being cp1251 encoded and somewhere a process ran to convert the bytes to UTF-8 based on the cp1251->UTF-8 mapping rules. However, when you run this on data that insn't really cp1251 encoded you get weird values.
iconv(target, from="cp1251", to="UTF-8")
# "обезпечен"
R Wrong encoding in console captured object (Cyrillic encoding)
Not sure, how to fix the encoding, but you might like the solution to use summary.default
instead. It won't show you the data samples, but at least you'll get the data types and columns with the right encoding
Related Topics
How to Clean Up R Memory Without Restarting My Pc
Dplyr: Put Count Occurrences into New Variable
Predicted Values for Logistic Regression from Glm and Stat_Smooth in Ggplot2 Are Different
Extract Rgb Channels from a Jpeg Image in R
Remove Data.Frame Row Names When Using Xtable
How to Merge Two Columns in R with a Specific Symbol
Replacing All Missing Values in R Data.Table with a Value
How to Automatically Include All 2-Way Interactions in a Glm Model in R
Efficient Alternatives to Merge for Larger Data.Frames R
How to Install R Package from Private Repo Using Devtools Install_Github
How to Combine Multiple Ggplot2 Elements into the Return of a Function
Can't Load X11 in R After Os X Yosemite Upgrade
Reshape Wide to Long with Character Suffixes Instead of Numeric Suffixes