Export UTF-8 BOM to .csv in R
On help page to Encoding
(help("Encoding")
) you could read about special encoding - bytes
.
Using this I was able to generate csv file by:
v <- "נווה שאנן"
X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)
Encoding(X$v1) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)
Take care about differences between factor
and character
. The following should work:
id_characters <- which(sapply(X,
function(x) is.character(x) && Encoding(x)=="UTF-8"))
for (i in id_characters) Encoding(X[[i]]) <- "bytes"
id_factors <- which(sapply(X,
function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)
How can i specify encode in fwrite() for export csv file R?
You should post a reproducible example, but I would guess you could do this by making sure the data in DT
is in UTF-8 within R, then setting the encoding of each column to "unknown". R will then assume the data is encoded in the native encoding when you write it out.
For example,
DF <- data.frame(text = "á", stringsAsFactors = FALSE)
DF$text <- enc2utf8(DF$text) # Only necessary if Encoding(DF$text) isn't "UTF-8"
Encoding(DF$text) <- "unknown"
data.table::fwrite(DF, "DF.csv", bom = TRUE)
If the columns of DF
are factors, you'll need to convert them to character vectors before this will work.
Loading .csv file with UTF-8 encoding error no lines available in input - byte order mark (BOM) ï..
You get the usual BOM problem. BOM is usually used to indicate the byte order on generic UTF-16 and UTF-32 (where byte order is relevant).
Obviously Microsoft think that changing interpretation of existing standard is the way to go (who care about interoperability to non-Microsoft systems?), so they use BOM as indication that the file is UTF-8, to distinguish from other legacy encodings used by DOS and Windows. (Note: Linux and Apple changed default encoding to UTF-8 without need to break stuffs, or adding BOM, and in a much quicker way).
So, UTF-8 files created by Microsoft usually have BOM (0xEF 0xBB 0xBF), which is shown as 
on cp1252 (the Microsoft extension to Latin-1).
But most of tools (not made by Microsoft) misinterpret BOM, sometime following the standard, so interpreting it as hidden white space (the original meaning of the codepoint converted in most modern Unicode standard as BOM), or just seeing it as binary data, so just ignoring data.
For this reason, now we have the encoding utf-8-bom
, which just skip the initial BOM (or create the BOM at writing). This usually fix the issue.
How to export a csv in utf-8 format?
Try opening a UTF8 connection:
con<-file('filename',encoding="UTF-8")
write.csv(...,file=con,...)
Related Topics
Warning in Install.Packages: Unable to Move Temporary Installation
Set Number of Columns (Or Rows) in a Facetted Plot
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
R: Calculate Cosine Distance from a Term-Document Matrix with Tm and Proxy
Si Prefixes in Ggplot2 Axis Labels
R Data.Table Breaks in Exported Functions
C5.0 Decision Tree - C50 Code Called Exit with Value 1
Show Element Values in Barplot
Make a File Writable in Order to Add New Packages
How to Merge Two Data Frames on Common Columns in R with Sum of Others
Grouping Every N Minutes with Dplyr
Got Message Unable to Load Shared Object Stats.So When R Starts
Sequence Length Encoding Using R