Error using cfhttp to retrieve page contents from bitly url

I am using cfhttp (Lucee Server) to scrape page contents from a url in the following manner:

<cfhttp url="#libs.originalAdPage#" method="GET" />

I then place this content in a div on my page.

This code has been working for a long time.

I have a need to report on the url's that have been scraped for their content and that information is placed into another website form that is not in my control. I decided to convert the url's to shortened bitly url's. I built the process into the page to create a bitly link and return that url to replace the existing url.

If i use the page with a shortened url from linkedin the page is scraped and displayed correctly in the div.

<cfhttp url="http://bit.ly/1NPhPgc" method="GET" />

But if I do an identical cfhttp call to a Indeed.com page shorted to a bitly URL I get a connection failure error.

<cfhttp url="http://bit.ly/1RQvlim" method="GET" />[![cfdump of connection failure][1]][1]

If I open this URL directly in the browser the page is displayed correctly.

Any ideas would be greatly appreciated.

Thanks,

Michael


I don't have access to a Lucee server to test with, however cfhttp on a ColdFusion server works fine for me for both of those bitly URLs. cfhttp follows the redirect and the FileContent contains the indeed.com page as would be expected.

Have you verified what happens with the Bitly Indeed URL if you prevent cfhttp from automatically following the redirects so that you can debug and follow the redirects manually? ie

<cftry>
    <cfhttp url="http://bit.ly/1RQvlim" method="GET" redirect="no" />
    <cfdump var="#cfhttp.responseHeader#" />
    <cfhttp url="#cfhttp.responseHeader.Location#" method="GET" />
    <cfdump var="#cfhttp#" label="cfhttp2" />
<cfcatch>
    <cfdump var="#cfcatch#" label="cfcatch" />
</cfcatch>
</cftry>

Indeed.com do pay attention to crawlers and user agents - just see their robots.txt for evidence of this.

Do you have access to a different server to test with in case there is something specific to Lucee's cfhttp implementation or to your IP address (eg blacklisted due to all the scraping)?

Have you tried tweaking the cfhttp useragent and/or any other headers as per How to emulate a real http request via cfhttp?

链接地址: http://www.djcxy.com/p/31240.html

上一篇: coldfusion 2016 cfhttp调用authorize.net返回连接失败

下一篇: 使用cfhttp从bitly url检索页面内容时出错