将HTML转换为有效的XML标签

2018-06-27 11:49:33

我需要帮助编写一个将HTML字符串转换为有效的XML标记名称的正则表达式函数。例如：它需要一个字符串并执行以下操作：

如果字符串中出现字母或下划线，它会保留它

如果出现任何其他字符，则将其从输出字符串中移除。

如果在单词或字母之间出现任何其他字符，则将其替换为Underscore。

Ex:
Input: Date Created
Ouput: Date_Created

Input: Date<br/>Created
Output: Date_Created

Input: DatenCreated
Output: Date_Created

Input: Date    1 2 3 Created
Output: Date_Created

基本上，正则表达式函数应该将HTML字符串转换为有效的XML标签。

一些正则表达式和一些标准函数：

function mystrip($s)
{
        // add spaces around angle brackets to separate tag-like parts
        // e.g. "<br />" becomes " <br /> "
        // then let strip_tags take care of removing html tags
        $s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));

        // any sequence of characters that are not alphabet or underscore
        // gets replaced by a single underscore
        return preg_replace('/[^a-z_]+/i', '_', $s);
}

尝试这个

$result = preg_replace('/([ds]|<[^<>]+>)/', '_', $subject);

说明

"
(               # Match the regular expression below and capture its match into backreference number 1
                   # Match either the regular expression below (attempting the next alternative only if this one fails)
      [ds]          # Match a single character present in the list below
                         # A single digit 0..9
                         # A whitespace character (spaces, tabs, and line breaks)
   |               # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      <               # Match the character “<” literally
      [^<>]           # Match a single character NOT present in the list “<>”
         +               # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      >               # Match the character “>” literally
)
"

应该可以使用：

$text = preg_replace( '/(?<=[a-zA-Z])[^a-zA-Z_]+(?=[a-zA-Z])/', '_', $text );

因此，可以看到前后是否有字母字符，并替换了它之间的任何非alpha /非下划线。

链接地址: http://www.djcxy.com/p/76855.html

上一篇: Convert HTML to valid XML tag

下一篇: Regular Expression to remove Div tags