Regex to strip tags, retain CDATA
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
Hi all,
I know how everyone loves a regex question, so here is mine. I have an XML tree within which some nodes contain CDATA. How do I return just a string containing the data?
Lets see an example
<xml>
<node>I'm plain text.</node>
<node><![CDATA[I'm text in cdata... and may contain html, <strong>yikes!</strong>]]></node>
</xml>
Would return
I'm plain text. I'm text in cdata... and may contain html, yikes!
I've read about not parsing an irregular language with a regular one, but I'm sure this is doable. What do you reckon guys?
Thanks, Kevin
EDIT: This was a problem that needed a quick and dirty solution to deal with a few lines of XML. I was surprised at the initial flat refusal, but from further reading (in particular from links provided later on) I see that experienced programmers know it's something that should be avoided wherever possible. Live and learn. Thanks.
Don't use regex, use an XML/HTML parser.
This issue has been beaten to death.
看一看解决这个问题有多困难的例子。
链接地址: http://www.djcxy.com/p/76870.html上一篇: HOw使用Java中的正则表达式来解析一个div?
下一篇: 正则表达式去掉标签,保留CDATA