Python3 urllib.request will not close connections immediately

I've got the following code to run a continuous loop to fetch some content from a website:

from http.cookiejar import CookieJar
from urllib import request

cj = CookieJar()
cp = request.HTTPCookieProcessor(cj)
hh = request.HTTPHandler()
opener = request.build_opener(cp, hh)

while True:
    # build url
    req = request.Request(url=url)
    p = opener.open(req)
    c = p.read()
    # process c
    p.close()
    # check for abort condition, or continue

The contents are correctly read. But for some reason, the TCP connections won't close. I'm observing the active connection count from a dd-wrt router interface, and it goes up consistently. If the script continue to run, it'll exhaust the 4096 connection limit of the router. When this happens, the script simply enter waiting state (the router won't allow new connections, but timeout hasn't hit yet). After couple minutes, those connections will be closed and the script can resume again.

I was able to observe the state of those hanging connections from the router. They share the same state: TIME_WAIT .

I'm expecting this script to use no more than 1 TCP connection simultaneously. What am I doing wrong?

I'm using Python 3.4.2 on Mac OS X 10.10.


Through some research, I discovered the cause of this problem: the design of TCP protocol . In a nutshell, when you disconnect, the connection isn't dropped immediately, it enters 'TIME_WAIT' state, and will time out after 4 minutes. Unlike what I was expecting, the connection doesn't immediately disappear.

According to this question, it's also not possible to forcefully drop a connection (without restarting the network stack).

It turns out in my particular case, like this question stated, a better option would be to use a persistent connection, aka HTTP keep-alive. As I'm querying the same server, this will work.

链接地址: http://www.djcxy.com/p/82450.html

上一篇: 使用OpenMP和llvm

下一篇: Python3 urllib.request不会立即关闭连接