Should I be worried about encoding during serialization?

public string Serialize(BackgroundJobInfo info)
{
    var stringBuilder = new StringBuilder();
    using (var stringWriter = new StringWriter(stringBuilder, CultureInfo.InvariantCulture))
    {
        var writer = XmlWriter.Create(stringWriter);
        ...

By default, StringWriter will advertise itself as being in UTF-16 . Usually XML is in UTF-8 . So I can fix this by subclassing StringWriter

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding
    {
         get { return Encoding.UTF8; }
    }
}

But why should I worry about that? What will be if I decide to use StringWriter (like I did) instead of Utf8StringWriter ? Will I have some bug?

After that I will write this string to MongoDb


StringWriter 's Encoding property actually is not that useful, as the underlying thing it writes to is a StringBuilder , which produces a .Net string . .Net strings are encoded internally in utf16, but that's an implementation detail you don't have to worry about. Encoding is just a property inherited from TextWriter , because a TextWriter can potentially write to targets where encoding does matter ( Stream , byte[] , ...).

In the end, you will end up with a plain old string . The encoding you will use to serialize that string later on is not fixed yet, and if you're using a MongoDb client implementation that takes a string as an argument, it is not even your concern!


On a side note, overriding the getter of the Encoding property would not change the way encoding would happen inside even if encoding was actually involved in StringWriter

链接地址: http://www.djcxy.com/p/32776.html

上一篇: Maven:编译OK,包OK,编译+包失败

下一篇: 我应该在序列化过程中担心编码吗?