urllib for python 3

This code in python3 is problematic:

import urllib.request
fhand=urllib.request.urlopen('http://www.py4inf.com/code/romeo.txt')
print(fhand.read())

Its output is:

b'But soft what light through yonder window breaks'
b'It is the east and Juliet is the sun'
b'Arise fair sun and kill the envious moon'
b'Who is already sick and pale with grief'

Why did I get b'...' ? What could I do to get the right response?

The right text should be

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

The b'...' is a byte string: an array of bytes, not a real string.

To convert to a real string, use

fhand.read().decode()

This uses the default encoding 'UTF-8'. For ASCII encoding, use

fhand.read().decode("ASCII")

for example


As the documentation says, urlopen returns an object whose read method gives you a sequence of bytes, not a sequence of characters. In order to convert the bytes to printable characters, which is what you want, you will need to apply the decode method, using the encoding that the bytes are in.

The reason the result seems to make sense is that the default encoding Python picks to display the bytes happens to be the right one, or at least happens to match the right one for these characters.

To do this properly, you should read().decode(encoding) where encoding is the encoding value from the Content-Type HTTP header, accessible through the HTTPResponse object (that is, fhand , in your code). If there is no Content-Type header, or if it doesn't specify an encoding, you're reduced to guessing which encoding to use, but for typical English text it doesn't matter, and in many other cases it's probably going to be UTF-8.


Python 3 distinguishes between byte sequences and strings. The "b" in front of the string tells you that urllib returned the contents as "raw" bytes. It might be worth reading into the python 3 bytes/strings situation, but basically, you did get the right text back. If you don't want the result to be bytes, you'd just have to convert it back to a "real" python string.

链接地址: http://www.djcxy.com/p/20900.html

上一篇: 为什么JVM允许为IntegerCache设置“高”值,而不是“低”?

下一篇: urllib for python 3