Should I be worried about encoding during serialization?
public string Serialize(BackgroundJobInfo info)
{
var stringBuilder = new StringBuilder();
using (var stringWriter = new StringWriter(stringBuilder, CultureInfo.InvariantCulture))
{
var writer = XmlWriter.Create(stringWriter);
...
By default, StringWriter will advertise itself as being in UTF-16 . Usually XML is in UTF-8 . So I can fix this by subclassing StringWriter
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
But why should I worry about that? What will be if I decide to use StringWriter (like I did) instead of Utf8StringWriter ? Will I have some bug?
After that I will write this string to MongoDb
StringWriter 's Encoding property actually is not that useful, as the underlying thing it writes to is a StringBuilder , which produces a .Net string . .Net strings are encoded internally in utf16, but that's an implementation detail you don't have to worry about. Encoding is just a property inherited from TextWriter , because a TextWriter can potentially write to targets where encoding does matter ( Stream , byte[] , ...).
In the end, you will end up with a plain old string . The encoding you will use to serialize that string later on is not fixed yet, and if you're using a MongoDb client implementation that takes a string as an argument, it is not even your concern!
On a side note, overriding the getter of the Encoding property would not change the way encoding would happen inside even if encoding was actually involved in StringWriter
下一篇: 我应该在序列化过程中担心编码吗?
