Get a specific content block of element using URL in PHP
Possible Duplicate:
How to parse and process HTML with PHP?
I know file_get_contents(url) method, but i wanted is that maybe using file_get_contents(url) at first to pull the contents of a page then is there something methods/functions that can extract or get a certain block of contents from the contents that you get using file_get_contents(url)? Here's a sample:
so the code will be like this:
$pageContent = file_get_contents('http://www.pullcontentshere.com/');
and this will be the output of $pageContent
<html> <body>
<div id="myContent">
<ul>
<li></li>
<li></li>
<li></li>
</ul>
</div>
</body> </html>
Maybe you have something to suggest or have in mind how to specifically extract the <div id="myContent">
and the entire children of it?
So it will be something like this:
$content = function_here($pageContent);
so the output would be like this:
<div id="myContent">
<ul>
<li></li>
<li></li>
<li></li>
</ul>
</div>
Answers are greatly appreciated!
Another way would be to use regex.
<?php
$string = '<html> <body>
<div id="myContent">
<ul>
<li></li>
<li></li>
<li></li>
</ul>
</div>
</body> </html>';
if ( preg_match ( '/<div id="myContent"(.*?)</div>/s', $string, $matches ) )
{
foreach ( $matches as $key => $match )
{
echo $key . ' => ' . htmlentities ( $match ) . '<br /><br />';
}
}
else
{
echo 'No match';
}
?>
Live example: http://codepad.viper-7.com/WSoWCh
You can use the built-in SimpleXMLElement as explained in nullpointr's answer, or you can also use regular expressions. Another solution, that I usually find pretty simple is PHP Simple HTML DOM Parser. You can use jQuery-style selectors with this lib. A simple example with your code would look like this:
// Create DOM from url
$html = file_get_html('http://www.pullcontentshere.com');
// Use a selector to reach the content you want
$myContent = $html->find('div.myContent')->plaintext;
You need to use XML parsing to solve your problem. I would recommend SimpleXML to you that is already part of php. Here's an example:
$sitecontent = "
<html>
<body>
<div>
<ul>
<li></li>
<li></li>
<li></li>
</ul>
</div>
</body>
</html>";
$xml = new SimpleXMLElement($sitecontent);
$xpath = $xml->xpath('//div');
print_r($xpath);
链接地址: http://www.djcxy.com/p/29910.html
上一篇: php:从网页中提取特定标签之间的文本
下一篇: 使用PHP中的URL获取元素的特定内容块