Importing an Excel File with Greek Characters into R in The Correct Encoding

Importing an Excel file with Greek characters into R in the correct encoding

Try this:

Sys.setlocale(category = "LC_ALL", locale = "Greek")

How to read utf-8 encoded text in R

I don't have access to your data so I cannot check:

data <- xlsx::read.xlsx("file.xlsx", sheetIndex = 1, encoding="UTF-8")

Is it possible to force Excel recognize UTF-8 CSV files automatically?

Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:

  1. Save the exported file as a csv
  2. Open Excel
  3. Import the data using Data-->Import External Data --> Import Data
  4. Select the file type of "csv" and browse to your file
  5. In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
  6. Change the Delimiter to comma
  7. Select where to import to and Finish

This way the special characters should show correctly.

Excel to CSV with UTF8 encoding

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.

NOTE: Google Sheets does have limitations when importing. See here.

NOTE: Be careful of sensitive data with Google Sheets.

EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.

Reading a CSV file containing greek characters

ReadAllLines has overload, which takes Encoding along file path

var lines = File.ReadAllLines(@"c:\test.csv", Encoding.Unicode)
.Select(line => line.Split(';'));

Testing:

File.WriteAllText(@"c:\test.csv", "ϗϡϢϣϤ", Encoding.Unicode);

Console.WriteLine(File.ReadAllLines(@"c:\test.csv", Encoding.Unicode));

will print:

ϗϡϢϣϤ

To find out in which encoding the file was actually written, use next snippet:

using (var r = new StreamReader(@"c:\test.csv", detectEncodingFromByteOrderMarks: true)) 
{
Console.WriteLine (r.CurrentEncoding.BodyName);
}

for my scenario it will print

utf-8


Related Topics



Leave a reply



Submit