Fetch a Wikipedia article with Python
I try to fetch a Wikipedia article with Python's urllib:
f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes")
s = f.read()
f.close()
However instead of the html page I get the following response: Error - Wikimedia Foundation:
Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to ()
Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09:09:08 GMT
Wikipedia seems to block request which are not from a standard browser.
Anybody know how to work around this?
You need to use the urllib2 that superseedes urllib in the python std library in order to change the user agent.
Straight from the examples
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
infile = opener.open('http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes')
page = infile.read()
It is not a solution to the specific problem. But it might be intersting for you to use the mwclient library (http://botwiki.sno.cc/wiki/Python:Mwclient) instead. That would be so much easier. Especially since you will directly get the article contents which removes the need for you to parse the html.
I have used it myself for two projects, and it works very well.
而不是试图欺骗维基百科,你应该考虑使用他们的高级API。
链接地址: http://www.djcxy.com/p/62840.html上一篇: 获取维基百科文章的第一行
下一篇: 用Python获取维基百科文章