How to Make Notepad to Save Text in Utf-8 Without the Bom

How can I make Notepad to save text in UTF-8 without the BOM?

  1. Use Notepad++ - it is free and much better than Notepad. It will help to save text without a BOM using EncodingEncode in UTF-8 without BOM:

    Notepad++ v6 and olders:
    Screenshot of the Notepad++ Menubar -> Encoding -> Encode in UTF-8 without BOM menu in Notepad++ v6.7.9.2


    Notepad++ v7+:

    Screenshot of the Notepad++ Menubar -> Encoding -> Encode in UTF-8 without BOM menu in Notepad++ v7+

  2. When I encountered this problem in Java, I didn't find any library to parse these first three bytes (BOM). So my advice:

    • Use PushbackInputStream(in, 3).
    • Read the first three bytes
    • If it's not BOM (EF BB BF), push them back
    • Process the stream as UTF-8

Remove a BOM character in a file

If you look in the same menu. Click "Convert to UTF-8."

If you look in the same menu. Click "Convert to UTF-8.

Writing UTF-8 without BOM

"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65.

Think of it this way:

  • "A".getBytes("UTF-8") returns a new byte[] { 65 }
  • "A".getBytes("ISO-8859-1") returns a new byte[] { 65 }
  • You write the results of those calls into a file
  • How is the consumer of the file supposed to distinguish the two?

There's nothing in that file that suggests that UTF-8 needs to be used to decode it.

Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).

Write text files without Byte Order Mark (BOM)?

In order to omit the byte order mark (BOM), your stream must use an instance of UTF8Encoding other than System.Text.Encoding.UTF8 (which is configured to generate a BOM). There are two easy ways to do this:

1. Explicitly specifying a suitable encoding:

  1. Call the UTF8Encoding constructor with False for the encoderShouldEmitUTF8Identifier parameter.

  2. Pass the UTF8Encoding instance to the stream constructor.

' VB.NET:
Dim utf8WithoutBom As New System.Text.UTF8Encoding(False)
Using sink As New StreamWriter("Foobar.txt", False, utf8WithoutBom)
sink.WriteLine("...")
End Using
// C#:
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
using (var sink = new StreamWriter("Foobar.txt", false, utf8WithoutBom))
{
sink.WriteLine("...");
}

2. Using the default encoding:

If you do not supply an Encoding to StreamWriter's constructor at all, StreamWriter will by default use an UTF8 encoding without BOM, so the following should work just as well:

' VB.NET:
Using sink As New StreamWriter("Foobar.txt")
sink.WriteLine("...")
End Using
// C#:
using (var sink = new StreamWriter("Foobar.txt"))
{
sink.WriteLine("...");
}

Finally, note that omitting the BOM is only permissible for UTF-8, not for UTF-16.

What's the difference between UTF-8 and UTF-8 with BOM?

The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8.

Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

2.6 Encoding Schemes


... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information.

UTF-8 without BOM

BOM or Byte Order Mark is sometimes quite annoying. Visual Studio does not change the file unless you save it (as Hans said).

And here is the solution to your problem:
If you want to save a file with other encodings, select save as and extend the save button in file dialog and select "Save with encoding". Or if you want to get rid of this setting permanently, just open File menu and select "Advanced save options" and there you should select "UTF-8 without signature" (and that also answered your last question :). Yes "UTF-8 without signature" is same as without BOM.

How do I convert an ANSI encoded file to UTF-8 with Notepad++?

Regarding this part:

When I convert it to UTF-8 without bom and close file, the file is again ANSI when I reopen.

The easiest solution is to avoid the problem entirely by properly configuring Notepad++.

Try Settings -> Preferences -> New document -> Encoding -> choose UTF-8 without BOM, and check Apply to opened ANSI files.

notepad++ UTF-8 apply to opened ANSI files

That way all the opened ANSI files will be treated as UTF-8 without BOM.

For explanation what's going on, read the comments below this answer.

To fully learn about Unicode and UTF-8, read this excellent article from Joel Spolsky.

Notepad++ convert to UTF-8 multiple files

Got my mistake. My notepad is in german. So take care if it's called "Encoding" or in my case "Kodierung" and "Convert to UTF-8 without BOM" is "Konvertiere zu UTF-8 ohne BOM"

That helped me out!

How do I save my file in UTF-16 LE encoding without BOM in VS2015?

It's not an option with VS2015. Even popular Notepad++ doesn't have an option for UTF-16 without BOM.

If just doing this for experimentation, use any hex editor to remove the BOM after saving. There is a binary editor built into VS2015. After saving a file as UTF-16LE with BOM, close and re-open the file with the Binary Editor and remove the BOM. I found that VS2015 couldn't open the file correctly without it, though, which may be why the option isn't available.



Related Topics



Leave a reply



Submit