Get a specific content block of element using URL in PHP

Possible Duplicate:
How to parse and process HTML with PHP?

I know file_get_contents(url) method, but i wanted is that maybe using file_get_contents(url) at first to pull the contents of a page then is there something methods/functions that can extract or get a certain block of contents from the contents that you get using file_get_contents(url)? Here's a sample:

so the code will be like this:

$pageContent = file_get_contents('http://www.pullcontentshere.com/');

and this will be the output of $pageContent

<html> <body>
    <div id="myContent">
        <ul>    
            <li></li>
            <li></li>
            <li></li>
        </ul>
    </div> 
</body> </html>

Maybe you have something to suggest or have in mind how to specifically extract the <div id="myContent"> and the entire children of it?

So it will be something like this:

$content = function_here($pageContent);

so the output would be like this:

        <div id="myContent">
            <ul>    
                <li></li>
                <li></li>
                <li></li>
            </ul>
        </div> 

Answers are greatly appreciated!


Another way would be to use regex.

<?php

$string = '<html> <body> 
    <div id="myContent"> 
        <ul>     
            <li></li> 
            <li></li> 
            <li></li> 
        </ul> 
    </div>  
</body> </html>';

if ( preg_match ( '/<div id="myContent"(.*?)</div>/s', $string, $matches ) )
{
    foreach ( $matches as $key => $match )
    {
        echo $key . ' => ' . htmlentities ( $match ) . '<br /><br />';
    }
}
else
{
    echo 'No match';
}

?>

Live example: http://codepad.viper-7.com/WSoWCh


You can use the built-in SimpleXMLElement as explained in nullpointr's answer, or you can also use regular expressions. Another solution, that I usually find pretty simple is PHP Simple HTML DOM Parser. You can use jQuery-style selectors with this lib. A simple example with your code would look like this:

// Create DOM from url
$html = file_get_html('http://www.pullcontentshere.com');
// Use a selector to reach the content you want
$myContent = $html->find('div.myContent')->plaintext;

You need to use XML parsing to solve your problem. I would recommend SimpleXML to you that is already part of php. Here's an example:

$sitecontent = "
<html>   
   <body>
      <div>
         <ul>    
            <li></li>
            <li></li>
            <li></li>
         </ul>
      </div> 
   </body> 
 </html>";

 $xml = new SimpleXMLElement($sitecontent);
 $xpath = $xml->xpath('//div');

 print_r($xpath);
链接地址: http://www.djcxy.com/p/29910.html

上一篇: php:从网页中提取特定标签之间的文本

下一篇: 使用PHP中的URL获取元素的特定内容块