HttpUtility.HtmlEncode escaping too much?
In our MVC3 ASP.net project, the HttpUtility.HtmlEncode method seems to be escaping too much characters. Our web pages are served as UTF-8 pages, but still the method escapes characters like ü or the Yen character ¥, even though tese characters are part of the UTF-8 set.
So when my asp.net MVC view contains the following piece of code:
@("<strong>ümlaut</strong>")
Then I would expect the Encoder to escape the html tags, but not the ümlaut
<strong>ümlaut</strong>
But instead it is giving me the following piece of HTML:
<strong>ümlaut</strong>
For completeness, I also mention that the responseEncoding in the web.config is explictely set to utf-8, so I would expect the HtmlEncode method to respect this setting.
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
Yes I have the face the same issue with my web pages. If we see the code of htmlEncode there is a point that translate this set of characters. Here is the code that this kind of characters also translated.
if ((ch >= 'x00a0') && (ch < 'A'))
{
output.Write("&#");
output.Write(ch.ToString(NumberFormatInfo.InvariantInfo));
output.Write(';');
}
else
{
output.Write(ch);
}
Here is the code of HtmlEncode
public static unsafe void HtmlEncode(string value, TextWriter output)
{
if (value != null)
{
if (output == null)
{
throw new ArgumentNullException("output");
}
int num = IndexOfHtmlEncodingChars(value, 0);
if (num == -1)
{
output.Write(value);
}
else
{
int num2 = value.Length - num;
fixed (char* str = ((char*) value))
{
char* chPtr = str;
char* chPtr2 = chPtr;
while (num-- > 0)
{
output.Write(chPtr2[0]);
chPtr2++;
}
while (num2-- > 0)
{
char ch = chPtr2[0];
if (ch <= '>')
{
switch (ch)
{
case '&':
{
output.Write("&");
chPtr2++;
continue;
}
case ''':
{
output.Write("'");
chPtr2++;
continue;
}
case '"':
{
output.Write(""");
chPtr2++;
continue;
}
case '<':
{
output.Write("<");
chPtr2++;
continue;
}
case '>':
{
output.Write(">");
chPtr2++;
continue;
}
}
output.Write(ch);
chPtr2++;
continue;
}
// !here is the point!
if ((ch >= 'x00a0') && (ch < 'Ā'))
{
output.Write("&#");
output.Write(ch.ToString(NumberFormatInfo.InvariantInfo));
output.Write(';');
}
else
{
output.Write(ch);
}
chPtr2++;
}
}
}
}
}
a Possible solutions is to make your custom HtmlEncode, or use the Anti-Cross Site scripting from MS.
http://msdn.microsoft.com/en-us/security/aa973814
As Aristos suggested we could use the AntiXSS library from Microsoft. It contains a UnicodeCharacterEncoder that behaves as you would expect.
But because we
We chose to implement our own very basic HTML encoder. You can find the code below. Please feel free to adapt/comment/improve if you see any issues.
public static class HtmlEncoder
{
private static IDictionary<char, string> toEscape = new Dictionary<char, string>()
{
{ '<', "lt" },
{ '>', "gt" },
{ '"', "quot" },
{ '&', "amp" },
{ ''', "#39" },
};
/// <summary>
/// HTML-Encodes the provided value
/// </summary>
/// <param name="value">object to encode</param>
/// <returns>An HTML-encoded string representing the provided value.</returns>
public static string Encode(object value)
{
if (value == null)
return string.Empty;
// If value is bare HTML, we expect it to be encoded already
if (value is IHtmlString)
return value.ToString();
string toEncode = value.ToString();
// Init capacity to length of string to encode
var builder = new StringBuilder(toEncode.Length);
foreach (char c in toEncode)
{
string result;
bool success = toEscape.TryGetValue(c, out result);
string character = success
? "&" + result + ";"
: c.ToString();
builder.Append(character);
}
return builder.ToString();
}
}
基于Thomas的回答,对空间,制表符和新行处理进行了一些改进,因为它们可能会破坏html的结构:
public static string HtmlEncode(string value,bool removeNewLineAndTabs)
{
if (value == null)
return string.Empty;
string toEncode = value.ToString();
// Init capacity to length of string to encode
var builder = new StringBuilder(toEncode.Length);
foreach (char c in toEncode)
{
string result;
bool success = toEscape.TryGetValue(c, out result);
string character = success ? result : c.ToString();
builder.Append(character);
}
string retVal = builder.ToString();
if (removeNewLineAndTabs)
{
retVal = retVal.Replace("rn", " ");
retVal = retVal.Replace("r", " ");
retVal = retVal.Replace("n", " ");
retVal = retVal.Replace("t", " ");
}
return retVal;
}
链接地址: http://www.djcxy.com/p/10430.html
上一篇: Oracle ROWNUM性能