Regex to strip tags, retain CDATA

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Hi all,

I know how everyone loves a regex question, so here is mine. I have an XML tree within which some nodes contain CDATA. How do I return just a string containing the data?

Lets see an example

<xml>
  <node>I'm plain text.</node>
  <node><![CDATA[I'm text in cdata... and may contain html, <strong>yikes!</strong>]]></node>
</xml>

Would return

I'm plain text. I'm text in cdata... and may contain html, yikes!

I've read about not parsing an irregular language with a regular one, but I'm sure this is doable. What do you reckon guys?

Thanks, Kevin

EDIT: This was a problem that needed a quick and dirty solution to deal with a few lines of XML. I was surprised at the initial flat refusal, but from further reading (in particular from links provided later on) I see that experienced programmers know it's something that should be avoided wherever possible. Live and learn. Thanks.


Don't use regex, use an XML/HTML parser.

This issue has been beaten to death.


看一看解决这个问题有多困难的例子。

链接地址: http://www.djcxy.com/p/76870.html

上一篇: HOw使用Java中的正则表达式来解析一个div?

下一篇: 正则表达式去掉标签,保留CDATA