How do I write out a text file in C# with a code page other than UTF-8?
using System.IO;
using System.Text;
using (StreamWriter sw = new StreamWriter(File.Open(myfilename, FileMode.Create), Encoding.WhateverYouWant))
{
sw.WriteLine("my text...");
}
An alternate way of getting your encoding:
using System.IO;
using System.Text;
using (var sw = new StreamWriter(File.Open(@"c:\myfile.txt", FileMode.CreateNew), Encoding.GetEncoding("iso-8859-1"))) {
sw.WriteLine("my text...");
}
Check out the docs for the StreamWriter constructor.
How to read text files with ANSI encoding and non-English letters?
var text = File.ReadAllText(file, Encoding.GetEncoding(codePage));
List of codepages : https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers?redirectedfrom=MSDN
How can I detect the encoding/codepage of a text file?
You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results. I can't find it now, but I'm sure Notepad can be tricked into displaying English text in Chinese.
Anyway, this is what you need to read:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Specifically Joel says:
The Single Most Important Fact About Encodings
If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII.
There Ain't No Such Thing As Plain Text.If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.
C# encoding when reading files
The reason is that by default the encoding used when reading text files is UTF8.
Encoding.Default
is not (despite its name) the default encoding used when reading files!
A much better name for Encoding.Default
would have been Encoding.UsingCurrentCodePage
, in my opinion. ;)
Also note that rather than using File.ReadLines(filePath, Encoding.GetEncoding(1252))
you could use File.ReadLines(filePath, Encoding.Default)
.
You would do that if your code is trying to read files that have been created in a different code page than 1252, and that code page is the current code page for the system on which the code is running.
The only reason you should be using code pages is if you are reading or writing legacy files.
Effective way to find any file's Encoding
The StreamReader.CurrentEncoding
property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM). If the file does not have a BOM, this cannot determine the file's encoding.
*UPDATED 4/08/2020 to include UTF-32LE detection and return correct encoding for UTF-32BE
/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
// Read the BOM
var bom = new byte[4];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
file.Read(bom, 0, 4);
}
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true); //UTF-32BE
// We actually have no idea what the encoding is if we reach this point, so
// you may wish to return null instead of defaulting to ASCII
return Encoding.ASCII;
}
File encoding doesn't work
You can use other overloads of Encoding.GetEncoding
to handle all cases when an Unicode character can't be converted to your target code page. More information on this MSDN topic.
The same could be achieved if you explicitly set the Encoding.EncoderFallback
property (link to MSDN).
For example you can use the following to throw an exception every time when conversion of one Unicode character fails:
Encoding enc = Encoding.GetEncoding(28605, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);
Note: The default EncoderFallback
is System.Text.InternalEncoderBestFitFallback
which produces question marks for unknown code points.
Set UTF8 encoding on streamwriter
System.IO.StreamWriter streamWriter = new System.IO.StreamWriter(new FileStream(dlg.FileName, FileMode.Open), Encoding.UTF8);
My C# code doesn't read special characters from file
The
č, ć, š, đ, ž
suggests here that this could be one of ANSI code pages of Eastern Europe. A recommendation is then to try
CodePagesEncodingProvider.Instance.GetEncoding(1250)
as the encoding.
Sadly, there's no easy way to guess a code page of a 8-bit file. To overcome such issues, UTF-8 (and other unicode encodings) were designed. Thus, if there's a control on how source files are created, please strongly recommend to have UTF8 (or Unicode but there's no need) files.
Related Topics
Wpf Webbrowser (3.5 Sp1) Always on Top - Other Suggestion to Display HTML in Wpf
How to Consume a Blazor Component as a Web Component Within a Regular Non-Blazor HTML Page
Navigation Property Without Declaring Foreign Key
Block Request for Multiple Unsuccessful Logins for a Period of Time
How to Add Style from Code Behind
Onchange Event for HTML.Dropdownlist
Irregular Shaped Windows Form (C#)
Asp.Net Vnext Kestrel + Windows Authentication
Asynchronous Controller Is Blocking Requests in ASP.NET MVC Through Jquery
Find If Lista Contains Any Elements Not in Listb
How to Use HTML.Textboxfor with Input Type=Date
How to Make the .Net Httpclient Use Http 2.0
How to Dllexport a C++ Class for Use in a C# Application
Should I Use Return/Continue Statement Instead of If-Else