当从输入字段读取属性时编码丢失

2018-06-14 19:54:51

我使用JavaScript从隐藏字段中提取值并将其显示在文本框中。隐藏字段中的值被编码。

例如，

<input id='hiddenId' type='hidden' value='chalk &amp; cheese' />

被拉进去

<input type='text' value='chalk &amp; cheese' />

通过一些jQuery从隐藏字段中获取值（在这一点上，我失去了编码）：

$('#hiddenId').attr('value')

问题在于，当我读到chalk & cheese 来自隐藏领域的chalk & cheese ，JavaScript似乎失去了编码。为了逃避"和' ，我希望编码保持。

有没有一个JavaScript库或jQuery方法将HTML编码一个字符串？

我使用这些功能：

function htmlEncode(value){
  // Create a in-memory div, set its inner text (which jQuery automatically encodes)
  // Then grab the encoded contents back out. The div never exists on the page.
  return $('<div/>').text(value).html();
}

function htmlDecode(value){
  return $('<div/>').html(value).text();
}

基本上在内存中创建一个div元素，但它永远不会附加到文档中。

在htmlEncode函数中，我设置元素的innerText ，并检索编码后的innerHTML ; 在htmlDecode函数中，我设置元素的innerHTML值，并检索innerText 。

检查一个正在运行的例子。

jQuery技巧不会对引号进行编码，在IE中，它会去掉你的空白。

基于Django中的转义模板标签，我猜测它已经被大量使用/测试过了，我做了这个功能来完成需要的功能。

与空白剥离问题的任何解决方法相比，它可以说比较简单（也可能更快），并且它会对引号进行编码，例如，如果要在属性值内使用结果，这是非常重要的。

function htmlEscape(str) {
    return str
        .replace(/&/g, '&amp;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#39;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');
}

// I needed the opposite function today, so adding here too:
function htmlUnescape(str){
    return str
        .replace(/&quot;/g, '"')
        .replace(/&#39;/g, "'")
        .replace(/&lt;/g, '<')
        .replace(/&gt;/g, '>')
        .replace(/&amp;/g, '&');
}

更新2013-06-17：
在寻找最快的转义时，我发现了一个replaceAll方法的实现：
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
（这里也引用：最快的方法来替换字符串中的所有字符的实例）
一些性能结果在这里：
http://jsperf.com/htmlencoderegex/25

它给上面的内置replace链提供了相同的结果字符串。如果有人能解释为什么它更快，我会非常高兴！？

更新2015-03-04：
我只注意到AngularJS正在使用上面的方法：
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435

他们添加了一些改进 - 他们似乎在处理一个模糊的Unicode问题以及将所有非字母数字字符转换为实体。只要你有一个为你的文档指定的UTF8字符集，我觉得后者不是必需的。

我会注意到（4年后）Django仍然没有做这两件事情，所以我不确定它们有多重要：
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44

更新2016-04-06：
你也可能希望逃避正斜杠/ 。这对于正确的HTML编码不是必需的，但OWASP建议将其作为反XSS安全措施。（感谢@JNF在评论中提出此建议）

        .replace(///g, '&#x2F;');

这是一个非jQuery版本，它比jQuery .html()版本和.replace()版本都快得多。这保留了所有的空白，但像jQuery版本，不处理引号。

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

速度： http : //jsperf.com/htmlencoderegex/17

速度测试

演示：

输出：

脚本：

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

function htmlDecode( html ) {
    var a = document.createElement( 'a' ); a.innerHTML = html;
    return a.textContent;
};

document.getElementById( 'text' ).value = htmlEncode( document.getElementById( 'hidden' ).value );

//sanity check
var html = '<div>   &amp; hello</div>';
document.getElementById( 'same' ).textContent = 
      'html === htmlDecode( htmlEncode( html ) ): ' 
    + ( html === htmlDecode( htmlEncode( html ) ) );

HTML：

<input id="hidden" type="hidden" value="chalk    &amp; cheese" />
<input id="text" value="" />
<div id="same"></div>

链接地址: http://www.djcxy.com/p/42155.html

上一篇: encoding lost when attribute read from input field

下一篇: What is a safe maximum length a segment in a URL path should be?