Convert HTML to valid XML tag
I need help writing a regex function that converts HTML string to a valid XML tag name. Ex: It takes a string and does the following:
Ex:
Input: Date Created
Ouput: Date_Created
Input: Date<br/>Created
Output: Date_Created
Input: DatenCreated
Output: Date_Created
Input: Date 1 2 3 Created
Output: Date_Created
Basically the regex function should convert the HTML string to a valid XML tag.
一些正则表达式和一些标准函数:
function mystrip($s)
{
// add spaces around angle brackets to separate tag-like parts
// e.g. "<br />" becomes " <br /> "
// then let strip_tags take care of removing html tags
$s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));
// any sequence of characters that are not alphabet or underscore
// gets replaced by a single underscore
return preg_replace('/[^a-z_]+/i', '_', $s);
}
Try this
$result = preg_replace('/([ds]|<[^<>]+>)/', '_', $subject);
Explanation
"
( # Match the regular expression below and capture its match into backreference number 1
# Match either the regular expression below (attempting the next alternative only if this one fails)
[ds] # Match a single character present in the list below
# A single digit 0..9
# A whitespace character (spaces, tabs, and line breaks)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
< # Match the character “<” literally
[^<>] # Match a single character NOT present in the list “<>”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
)
"
Should be able to use:
$text = preg_replace( '/(?<=[a-zA-Z])[^a-zA-Z_]+(?=[a-zA-Z])/', '_', $text );
So, there's lookarounds to see if there's an alpha character before and after, and replaces any non-alpha / non-underscore between it.
链接地址: http://www.djcxy.com/p/76856.html下一篇: 将HTML转换为有效的XML标签