从用户网站输入获取网站标题

2018-06-20 11:56:54

我试图获取用户输入的网站的标题。

文本输入：用户输入的网站链接通过AJAX发送到服务器。用户可以输入任何内容：实际存在的链接，或者只是一个单词，或者像'po392＃* @ 8'

这是我的PHP脚本的一部分 ：

         // Make sure the url is on another host
        if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
            $url = "http://".$url;
        }

        // Extra confirmation for security
        if (filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED)) {
            $urlIsValid = "1";
        } else {
            $urlIsValid = "0";
        }

        // Make sure there is a dot in the url
        if (strpos($url, '.') !== false) {
            $urlIsValid = "1";
        } else {
            $urlIsValid = "0";
        }

        // Retrieve title if no title is entered
        if($title == "" AND $urlIsValid == "1") {

            function get_http_response_code($theURL) {
                $headers = get_headers($theURL);
                if($headers) {
                    return substr($headers[0], 9, 3);
                } else {
                    return 'error';
                }
            }

            if(get_http_response_code($url) != "200") {

                $urlIsValid = "0";

            } else {

                $file = file_get_contents($url);

                $res = preg_match("/<title>(.*)</title>/siU", $file, $title_matches);

                if($res === 1) {
                    $title = preg_replace('/s+/', ' ', $title_matches[1]);
                    $title = trim($title);

                    $title = addslashes($title);
                }

                // If title is still empty, make title the url
                if($title == "") {
                    $title = $url;
                }

            }
        }

但是，此脚本中仍然存在错误。

它完美，如果输入了现有的网址为“https://www.youtube.com/watch?v=eB1HfI-nIRg”当一个不存在的页面输入为“https://www.youtube.com/手表？ν=不存在”，但是当用户进入类似的东西不起作用 ‘twitter.com’（不含http）或类似的东西‘让人惊讶’。

我尝试字面翻译：cUrl，DomDocument ...

问题是，当输入无效链接时，ajax调用永远不会完成（它会一直加载），而每当发生错误时，它应该为$ urlIsValid =“0”。

我希望有人能帮助你 - 这是值得赞赏的。

弥敦道

你有一个相对简单的问题，但你的解决方案太复杂，也有问题。

这些是我用你的代码识别的问题：

// Make sure the url is on another host
if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
     $url = "http://".$url;
}

你不会确定这个可能的URL是在另一个主机上（可能是localhost ）。你应该删除这段代码。

// Make sure there is a dot in the url
if (strpos($url, '.') !== false) {
        $urlIsValid = "1";
} else {
        $urlIsValid = "0";
}

这段代码覆盖它上面的代码，在那里你验证字符串确实是一个有效的URL ，所以删除它。

附加函数get_http_response_code的定义是毫无意义的。您只能使用file_get_contents来获取远程页面的HTML ，并将其与false进行检查以检测错误。

另外，从你的代码我得出结论，如果（外部的上下文）变量$title是空的，那么你将不会执行任何外部抓取，所以为什么不先检查它？

总结起来，你的代码应该是这样的：

if('' === $title && filter_var($url, FILTER_VALIDATE_URL))
{
    //@ means we suppress warnings as we won't need them
    //this could be done with error_reporting(0) or similar side-effect method
    $html = getContentsFromUrl($url);

    if(false !== $html && preg_match("/<title>(.*)</title>/siU", $file, $title_matches))
    {
        $title = preg_replace('/s+/', ' ', $title_matches[1]);
        $title = trim($title);
        $title = addslashes($title);
    }

    // If title is still empty, make title the url
    if($title == "") {
        $title = $url;
    }
}

function getContentsFromUrl($url)
{
   //if not full/complete url
   if(!preg_match('#^https?://#ims', $url))
   {
       $completeUrl = 'http://' . $url;
       $result = @file_get_contents($completeUrl);
       if(false !== $result)
       {
           return $result;
       }

       //we try with https://
       $url = 'https://' . $url;
   }

   return @file_get_contents($url);
}

链接地址: http://www.djcxy.com/p/57641.html

上一篇: Get Website Title From User Site Input

下一篇: In what order are conditions inside an IF checked?