Regexp for html

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have the following string:

$str = " 
<li>r</li>  
<li>a</li>  
<li>n</li>  
<li>d</li>  
...
<li>om</li>  
";

How do I get the HTML for the first n-th <li> tags?

Ex : n = 3 ; result = "<li>r<...>n</li>;

I would like a regexp if possible.


Like this.

$dom = new DOMDocument();
@$dom->loadHTML($str);
$x = new DOMXPath($dom); 

// we wan the 4th node.
foreach($x->query("//li[4]") as $node) 
{
  echo $node->c14n()
}

Oh yeah, learn xpath, it will save you lots of trouble in the future.


The Solution of @Byron but with SimpleXML:

$xml = simplexml_load_string($str);

foreach($xml->xpath("//li[4]") as $node){
  echo $node[0]; // The first element is the text node
}

EDIT : Another reason I really like at simplexml is the easy debugging of the content of a node. You can just use print_r($xml) to print the object with it's child nodes.


As I'm sure you are aware it is not a good idea to use regular expressions to work through HTML unless you were to "tidy" it first.

A very viable solution in PHP would be to navigate the HTML structure using Simple XML (http://php.net/manual/en/book.simplexml.php) or as a DOM Document (http://php.net/manual/en/class.domdocument.php).

链接地址: http://www.djcxy.com/p/76850.html

上一篇: 正则表达式去除链接

下一篇: 正则表达式的HTML