Nexpose XML report Version 2.0, How to remove HTML from XML?

I've made a php parser for PHP for Nxpose XML version 2.0 and it works fine but recently the parser fails.

The problem seems to be because the XML that I'm trying to parse has HTML between the XML Elements without a CDATA tags, that means that the HTML code has invalid characters. So the XML is not valid to parse with the libraries I'm using, xmlReader and simpleXML.

This is a example the kind of lines that are invalid for this DOM libraries of PHP:

<Paragraph preformat="true">98: 99: <BODY scroll="AUTO" bgColor="#FFFFFF" text="#000000" onload="setFo... 100: <FORM action="/exchweb/bin/auth/owaauth.dll" method="POST" name="... 101: 98: <INPUT type="hidden" name="destination" value="
http://www.rapid7.com"...</Paragraph>

Any Idea how to detect all lines like this one and delete it?

Right now the only pattern I detect to find this lines is hat before a HTML code are number as identifiers with the following pattern:

<number>:<html-code>

Thanks in advance for your help guys.

Kind Regards


你应该试试这个:

<Paragraph.+[0-9]:.+</Paragraph>
链接地址: http://www.djcxy.com/p/64752.html

上一篇: 在PHP中不使用太多内存的情况下读/写大型XML

下一篇: Nexpose XML报告2.0版,如何从XML中删除HTML?