Serializing an object as UTF-8 XML in .NET
Your code doesn't get the UTF-8 into memory as you read it back into a string again, so its no longer in UTF-8, but back in UTF-16 (though ideally its best to consider strings at a higher level than any encoding, except when forced to do so).
To get the actual UTF-8 octets you could use:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
byte[] utf8EncodedXml = memoryStream.ToArray();
I've left out the same disposal you've left. I slightly favour the following (with normal disposal left in):
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
using(var memStm = new MemoryStream())
using(var xw = XmlWriter.Create(memStm))
{
serializer.Serialize(xw, entry);
var utf8 = memStm.ToArray();
}
Which is much the same amount of complexity, but does show that at every stage there is a reasonable choice to do something else, the most pressing of which is to serialise to somewhere other than to memory, such as to a file, TCP/IP stream, database, etc. All in all, it's not really that verbose.
How to return xml as UTF-8 instead of UTF-16
Encoding of the Response
I am not quite familiar with this part of the framework. But according to the MSDN you can set the content encoding of an HttpResponse like this:
httpContextBase.Response.ContentEncoding = Encoding.UTF8;
Encoding as seen by the XmlSerializer
After reading your question again I see that this is the tough part. The problem lies within the use of the StringWriter
. Because .NET Strings are always stored as UTF-16 (citation needed ^^) the StringWriter
returns this as its encoding. Thus the XmlSerializer
writes the XML-Declaration as
<?xml version="1.0" encoding="utf-16"?>
To work around that you can write into an MemoryStream like this:
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
XmlSerializer xml = new XmlSerializer(typeof(T));
xml.Serialize(writer, Data);
// I am not 100% sure if this can be optimized
httpContextBase.Response.BinaryWrite(stream.ToArray());
}
Other approaches
Another edit: I just noticed this SO answer linked by jtm001. Condensed the solution there is to provide the XmlSerializer
with a custom XmlWriter
that is configured to use UTF8 as encoding.
Athari proposes to derive from the StringWriter
and advertise the encoding as UTF8.
To my understanding both solutions should work as well. I think the take-away here is that you will need one kind of boilerplate code or another...
How to Specify XML Encoding when Serializing an Object in C#
If I change from Encoding.Unicode
to Encoding.UTF8
, the file is generated properly. Perhaps you're looking at an old version of your file?
In an unrelated bit, you should use using
for deterministic disposal of objects which implement IDisposable
:
XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyObject));
using (Stream stream = new FileStream(@".\doc.xml", FileMode.Create))
using (XmlWriter xmlWriter = new XmlTextWriter(stream, Encoding.UTF8))
{
xmlSerializer.Serialize(xmlWriter, myObject);
}
XmlSerializer change encoding
Here is a code with encoding as parameter. Please read the comments why there is a SuppressMessage for code analysis.
/// <summary>
/// Serialize an object into an XML string
/// </summary>
/// <typeparam name="T">Type of object to serialize.</typeparam>
/// <param name="obj">Object to serialize.</param>
/// <param name="enc">Encoding of the serialized output.</param>
/// <returns>Serialized (xml) object.</returns>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage", "CA2202:Do not dispose objects multiple times")]
internal static String SerializeObject<T>(T obj, Encoding enc)
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings xmlWriterSettings = new System.Xml.XmlWriterSettings()
{
// If set to true XmlWriter would close MemoryStream automatically and using would then do double dispose
// Code analysis does not understand that. That's why there is a suppress message.
CloseOutput = false,
Encoding = enc,
OmitXmlDeclaration = false,
Indent = true
};
using (System.Xml.XmlWriter xw = System.Xml.XmlWriter.Create(ms, xmlWriterSettings))
{
XmlSerializer s = new XmlSerializer(typeof(T));
s.Serialize(xw, obj);
}
return enc.GetString(ms.ToArray());
}
}
Easier way to serialize C# class as XML text
A little shorter :-)
var yourList = new List<int>() { 1, 2, 3 };
using (var writer = new StringWriter())
{
new XmlSerializer(yourList.GetType()).Serialize(writer, yourList);
var xmlEncodedList = writer.GetStringBuilder().ToString();
}
Although there's a flaw with this previous approach that's worth pointing out. It will generate an utf-16
header as we use StringWriter so it is not exactly equivalent to your code. To get utf-8
header we should use a MemoryStream and an XmlWriter which is an additional line of code:
var yourList = new List<int>() { 1, 2, 3 };
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream))
{
new XmlSerializer(yourList.GetType()).Serialize(writer, yourList);
var xmlEncodedList = Encoding.UTF8.GetString(stream.ToArray());
}
}
Objects not serialising to XML (UTF-8) as expected .net?
What you're seeing is the byte order mark (BOM) that is often used at the start of text files or streams to indicate the byte order and the Unicode variant.
Your serializer is very strange. If you encode a string with some encoding such as UTF-8, you have to return it as an array of bytes. By first encoding the the XML in UTF-8 and then decoding the UTF-8 stream back to a string, you gain nothing (except introducing the problematic BOM).
Either go with UTF-16 only or return a byte array. As the function is now, the encoding just introduces problems.
Update:
Based on the code in the comment below, I'll see two approaches:
Approach 1: Create a string with the serialized data and convert it to UTF-8 late
Public Shared Function SerializeObject(ByVal obj As Object) As String
Dim serializer As New XmlSerializer(obj.GetType)
Using strWriter As New IO.StringWriter()
serializer.Serialize(strWriter, obj)
Return strWriter.ToString
End Using
End Function
....
Dim serialisedObject As String = SerializeObject(object)
Dim postData As Byte() = New Text.UTF8Encoding(True).GetBytes(serialisedObject)
If you need a differnt encoding, change the last line. If you want to omit the byte order mark, pass False
to UTF8Encoding()
.
Approach 2: Create the properly encoded data in the first place and continue with a byte array
Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As Byte()
Dim serializer As New XmlSerializer(obj.GetType)
If encoding Is Nothing Then
Set encoding = Encoding.Unicode
End If
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
serializer.Serialize(xtWriter, obj)
Return stream.ToArray()
End Using
End Function
....
Dim postData As Byte() = SerializeObject(object)
In this case, the XmlTextWriter
directly encodes the data with the correct encoding. As since we have a byte array already, the last step is shorter: we directly have the data to send to the client.
Setting StandAlone = Yes in .Net when serializing an object
If you want to do this then you'll need to use WriteProcessingInstruction
method and manually write it out.
Using stream As New IO.MemoryStream, xtWriter As Xml.XmlWriter = Xml.XmlWriter.Create(stream, settings)
xtWriter.WriteProcessingInstruction("xml", "version=""1.0"" encoding=""UTF-8"" standalone=""yes""")
serializer.Serialize(xtWriter, obj)
Return encoding.GetString(stream.ToArray())
End Using
Related Topics
How to Save Image in Database Using C#
How to Pass Parameters to the Custom Action
Example Using Hyperlink in Wpf
How to Get the Width and Height of a Multi-Dimensional Array
C# Reflection and Finding All References
Wpf - How to Create Menu and Submenus Using Binding
Run Process as Administrator from a Non-Admin Application
Sqlbulkcopy Insert with Identity Column
Why Appdomain.Currentdomain.Basedirectory Not Contains "Bin" in ASP.NET App
Wix - How to Run/Install Application Without Ui
Returning Http Status Code from Web API Controller
Getting Checkbox Value in ASP.NET MVC 4
The Current Synchronizationcontext May Not Be Used as a Taskscheduler