Xdocument: Saving Xml to File Without Bom

XDocument: saving XML to file without BOM

Use an XmlTextWriter and pass that to the XDocument's Save() method, that way you can have more control over the type of encoding used:

var doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement("root", new XAttribute("note", "boogers"))
);
using (var writer = new XmlTextWriter(".\\boogers.xml", new UTF8Encoding(false)))
{
doc.Save(writer);
}

The UTF8Encoding class constructor has an overload that specifies whether or not to use the BOM (Byte Order Mark) with a boolean value, in your case false.

The result of this code was verified using Notepad++ to inspect the file's encoding.

Force no BOM when saving XML

You can create a UTF8Encoding instance which doesn't use the BOM, instead of using Encoding.UTF8.

using (TextWriter sw = new StreamWriter(file, false, new UTF8Encoding(false))) {
doc.Save(sw);
}

You can save this in a static field if you're worried about the cost of instantiating it repeatedly:

private static readonly Encoding UTF8NoByteOrderMark = new UTF8Encoding(false);

...

using (TextWriter sw = new StreamWriter(file, false, UTF8NoByteOrderMark)) {
doc.Save(sw);
}

Powershell XMLDocument save as UTF-8 without BOM

Unfortunately, the presence of an explicit encoding="utf-8" attribute in the declaration of an XML document causes .NET's [xml] (System.Xml.XmlDocument) type to .Save() the document, when given a file path, to an UTF-8-encoded file with BOM, which can indeed cause problems (even though it shouldn't[1]).

A request to change this has been green-lighted in principle, but is not yet implemented as of .NET 6.0 (due to a larger discussion about changing [System.Text.Encoding]::UTF8 to not use a BOM, in which case .Save() would automatically not create a BOM anymore either).

Somewhat ironically, the absence of an encoding attribute causes .Save() to create UTF-8-encoded files without a BOM.

A simple solution is therefore to remove the encoding attribute[2]; e.g.:

# Create a sample XML document:
$xmlDoc = [xml] '<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>'

# Remove the 'encoding' attribute from the declaration.
# Without this, the .Save() method below would create a UTF-8 file *with* BOM.
$xmlDoc.ChildNodes[0].Encoding = $null

# Now, saving produces a UTf-8 file *without* a BOM.
$xmlDoc.Save("$PWD/out.xml")

[1] Per the XML W3C Recommendation: "entities encoded in UTF-8 MAY begin with the Byte Order Mark" [BOM].

[2] This is safe to do, because the XML W3C Recommendation effectively mandates UTF-8 as the default in the absence of both a BOM and an encoding attribute.

How to make sure a XDocument is saved with utf-8 file encoding?

If you want fine-grained encoding control, you probably want to control the TextWriter; for example, in the example below I'm using UTF-8 sans-BOM. However, if possible, you could also write directly to a file via a FileStream...

using System;
using System.IO;
using System.Text;
using System.Xml.Linq;

class Program
{
static void Main()
{
var bytes = new Program().Serialize();
File.WriteAllBytes("my.xml", bytes);
}
public byte[] Serialize()
{
using (var stream = new MemoryStream())
{
WriteXmlToStream(stream);

return stream.ToArray();
}
}

private void WriteXmlToStream(Stream stream)
{
var document =
new XDocument(
new XElement("Coleta",
new XElement("Operador", "foo"),
new XElement("Sujeito", "bar"),
new XElement("Início", DateTime.Now),
new XElement("Descrição", "Descrição")
// and so on
)
);
using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
{
document.Save(writer);
}
}
}

The above works fine, and encodes correctly.

To write directly to a file instead:

public void Serialize(string path)
{
using (var stream = File.Create(path))
{
WriteXmlToStream(stream);
}
}

XDocument how to save without Byte Order Mark AND preseve formatting/whitespace

Problem solved (Updated due to issue with creating unnecessary whitespace):

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.Encoding = new UTF8Encoding(false);
using (var writer = XmlWriter.Create(file, settings))
{
xdoc.Save(writer);
}

Issue with XDocument and the BOM (Byte Order Mark)

If you're writing the XML with an XmlWriter, you can set the Encoding to one that has been initialized to leave out the BOM.

EG: System.Text.UTF8Encoding's constructor takes a boolean to specify whether you want the BOM, so:

XmlWriter writer = XmlWriter.Create("foo.xml");
writer.Settings.Encoding = new System.Text.UTF8Encoding(false);
myXDocument.WriteTo(writer);

Would create an XmlWriter with UTF-8 encoding and without the Byte Order Mark.

c# How to load a memory stream to an xmldocument without BOM? (My memory stream is generated from XMLTextWriter without BOM)

I don't know why you're overwriting the MemoryStream again, that seems a bad idea.

But if you need the XDocument then you can use the XDocument.Save(XmlWriter writer) option and make sure you create the writer with the no-BOM settings as in the beginning of your code.



Related Topics



Leave a reply



Submit