CFHTTP Encoding Problem
I am trying to pull a page for parsing information out of it using cfhttp. The page headers that I am calling are:
Content-Encoding: gzip
Connection: Keep-Alive
Content-Length: 19066
Server: IBM_HTTP_Server
Vary: Accept-Encoding, User-Agent
Content-Language: en-US
Cache-Control: no-cache="set-cookie,
set-cookie2"
Content-Type:
text/html;charset=ISO-8859-1
I set the charset to ISO-8859-1 however I am getting the following in the FileContent (only a small sample is shown below but I think it gets to point across).
EðÑq·Oã?·ÌZóL¯þ´Vú5ðbä£ÿæ¾_HÉÒñQãOÇþãë85ÁÜ à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ѹfã×ARÃi_iûRã _ òCA¿-ß. "b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@ÃZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*¬6ÚwÇ0]ã:àÑþé¬G"ÅÁl/t° jlá»5¶&¯lìYìºØ'yDð½|#ý<ñìTé%¾ï¬ùƪx¶}«±o9»ë¼ÂÆÒï'w8Y?
÷ðxsllû 6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÃ8M
I tried other charsets and was considering the gzip encoding to be causing the problem but I am unsure how the test if that is the issue. Any suggestions or help would be greatly valued.
Below is my Code
<cfhttp
METHOD="get"
throwonerror="yes"
CHARSET="ISO-8859-1"
URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">
<cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>
<cfset listings = #cfhttp.FileContent#>
<cfoutput>
#listings#
</cfoutput>
I have also tried the headers:
<cfhttpparam type="Header" name="Accept-Encoding" value="*">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
And tried removing the 'Accept-Encoding' header and just leaving the TE.
UPDATE: I still havn't figured it out, but I found something that might help someone help me out. When I used a test php server of mine to run file_get_contents on the same page and it worked fine, then if I ran the same cfhttp code to call the php page that was calling the page I need it worked just fine. Thanks for the suggestions so far.
The issue with cars.com seems to be that they're gzipping the output twice (based on this thread)
So, we need to unzip the content... again...
First, we need to get the content as binary, so the CFHTTP call needs to include
getasbinary="yes"
Then, we need to unzip it.
We can use java.util.zip to do it. The gunzip is a modified version of this cflib.org function:
<cfhttp
getasbinary="yes"
METHOD="get"
throwonerror="yes"
CHARSET="ISO-8859-1"
URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10" >
<cfhttpparam type="Header" name="Accept" value="application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5">
<cfhttpparam type="Header" name="User-Agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41">
<cfhttpparam type="Header" name="Accept-Encoding" value="deflate">
<cfhttpparam type="Header" name="TE" value="deflate, chunked, identity, trailers">
</cfhttp>
<cfset unzippedHTML = gunzip(cfhttp.FileContent)>
<cfoutput>
#unzippedHTML#
</cfoutput>
<cfscript>
function gunzip(inBytes) {
var gzInStream = createObject('java','java.util.zip.GZIPInputStream');
var outStream = createObject('java','java.io.ByteArrayOutputStream');
var inStream = createObject('java','java.io.ByteArrayInputStream');
var buffer = repeatString(" ",1024).getBytes();
var length = 0;
var rv = "";
try {
inStream.init(inBytes);
gzInStream.init(inStream);
outStream.init();
do {
length = gzInStream.read(buffer,0,1024);
if (length neq -1) outStream.write(buffer,0,length);
} while (length neq -1);
rv = outStream.toString();
outStream.close();
gzInStream.close();
inStream.close();
}
catch (any e) {
rv = "";
try {
outStream.close();
} catch (any e) { }
try {
gzInStream.close();
} catch (any e) {
try {
inStream.close();
} catch (any e) {}
}
}
return rv;
}
</cfscript>
Be sure to double-check the var scoping of the function. I might have missed something.
Per the header what you are seeing is the gzipped contents of the file. It will need to be uncompressed before it is useful to you. I assume you can do this with cfzip but have not had any experience doing it.
This post seems to indicate that you can add a header in your request to have it unzipped/deflated before being returned:
<cfhttp ...>
<cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
<cfhttpparam type="Header" name="TE" value="deflate;q=0">
</cfhttp>
The first thing I would do is make sure that it's not the source content/server that's the problem by trying your same code against other pages. If they work fine, then it's likely the server/content that you're trying to consume. If they have the same problem, then the issue is in your code. It would also be helpful if you posted your code.
链接地址: http://www.djcxy.com/p/31232.html下一篇: CFHTTP编码问题