Scrape Facebook in Python

I'm interested in getting the number of friends each of my friends on Facebook has. Apparently the official Facebook API does not allow getting the friends of friends, so I need to get around this (somehwhat sensible) limitation somehow. I tried the following:

import sys
import urllib, urllib2, cookielib

username = 'me@example.com'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : username, 'pass' : password})
request = urllib2.Request('https://login.facebook.com/login.php')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12')
opener.open(request, login_data)
resp = opener.open('http://facebook.com')
print resp.read()

but I only end up with a captcha page. Any idea how FB is detecting that the request is not from a "normal" browser? I could add an extra step and solve the captcha but that would add unnecessary complexity to the program so I would rather avoid it. When I use a web browser with the same User-Agent string I don't get a captcha.

Alternatively, does anyone have any saner ideas on how to accomplish my goal, ie get a list of friends of friends?


Have you tried tracing and comparing HTTP transactions with Fiddler2 or Wireshark? Fiddler can even trace https, as long as your client code can be made to work with bogus certs.


I did try a lot of ways to scrape facebook and the only way that worked for me is :

To install selenium , the firefox plugin, the server and the python client library. Then with the firefox plugin, you can record the actions you do to login and export as a python script, you use this as a base for your work and it will work. Basically I added to this script a request to my webserver to fectch a list of things to inspect on FB and then at the end of the script I send the results back to my server.

I could NOT find a way to do it directly from my server with a browser simulator like mechanize or else ! I guess It needs to be done from a client browser.

链接地址: http://www.djcxy.com/p/3888.html

上一篇: 给我一些令人敬畏的Web2py提示和技巧,让你更高效

下一篇: 在Python中刮脸Facebook