Ignoring specified encoding when deserializing XML
I am trying to read some XML received from an external interface over a socket. The problem is that the encoding is specified wrong in the XML-header (it says iso-8859-1, but it is utf-16BE). It is documented that the encoding is utf-16BE, but apparently they forgot to set the correct encoding.
To ignore the encoding when I deserialize I use a StringReader like this:
private static T DeserializeXmlData<T>(byte[] xmlData)
{
var xmlString = Encoding.BigEndianUnicode.GetString(xmlData);
using (var reader = new StringReader(xmlString))
{
reader.ReadLine(); // Eat header line
using (var xmlReader = XmlReader.Create(reader))
{
var serializer = new XmlSerializer(typeof(T));
return (T)serializer.Deserialize(xmlReader);
}
}
}
The above actually works fine, but I don't like the part where I just skip the header line by calling ReadLine. Is there a less brittle way to bypass the encoding specified in the XML-header?
Solution with StreamReader
By using a StreamReader, I can override the encoding specified in the XML-header. Specifying XmlReaderSettings.IgnoreProcessingInstructions or not did not do any difference. Interestingly the StreamReader ignores the specified encoding if it finds a unicode byte-order mark.
To recap:
In conclusion, the most robust solution seems to be using a StreamReader, since it uses the byte-order mark, if present.
private static T DeserializeXmlData<T>(byte[] xmlData)
{
using (var xmlDataStream = new MemoryStream(xmlData))
{
using (var reader = new StreamReader(xmlDataStream, Encoding.BigEndianUnicode))
{
using (var xmlReader = XmlReader.Create(reader))
{
var serializer = new XmlSerializer(typeof (T));
return (T) serializer.Deserialize(xmlReader);
}
}
}
}
我想我只是使用StreamReader,使用正确的编码构造并将其传递给XmlReader.Create(TextStream)方法:
using (var sr = new StreamReader(@"c:tempbad.xml", Encoding.BigEndianUnicode)) {
using (var xr = XmlReader.Create(sr, new XmlReaderSettings())) {
// etc...
}
}
如果没有其他相关的处理指令,您可以通过设置XmlReaderSettings.IgnoreProcessingInstructions
来忽略它们。
上一篇: 如何减少Delphi中的PageControl闪烁?
下一篇: 反序列化XML时忽略指定的编码