Should I be worried about encoding during serialization?
public string Serialize(BackgroundJobInfo info)
{
var stringBuilder = new StringBuilder();
using (var stringWriter = new StringWriter(stringBuilder, CultureInfo.InvariantCulture))
{
var writer = XmlWriter.Create(stringWriter);
...
By default, StringWriter
will advertise itself as being in UTF-16
. Usually XML
is in UTF-8
. So I can fix this by subclassing StringWriter
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
But why should I worry about that? What will be if I decide to use StringWriter
(like I did) instead of Utf8StringWriter
? Will I have some bug?
After that I will write this string to MongoDb
StringWriter
's Encoding
property actually is not that useful, as the underlying thing it writes to is a StringBuilder
, which produces a .Net string
. .Net strings are encoded internally in utf16, but that's an implementation detail you don't have to worry about. Encoding
is just a property inherited from TextWriter
, because a TextWriter
can potentially write to targets where encoding does matter ( Stream
, byte[]
, ...).
In the end, you will end up with a plain old string
. The encoding you will use to serialize that string later on is not fixed yet, and if you're using a MongoDb client implementation that takes a string as an argument, it is not even your concern!
On a side note, overriding the getter of the Encoding
property would not change the way encoding would happen inside even if encoding was actually involved in StringWriter
下一篇: 我应该在序列化过程中担心编码吗?