带有UTF的Python unicode字符串

2018-07-01 12:03:40

我从库中回来看起来是一个不正确的unicode字符串：

>>> title
u'Sopetxc3xb3n'

现在，那两个十六进制转义符是U + 00F3 LATIN带有ACUTE的小字母O的UTF-8编码。据我了解，Python中的unicode字符串应该具有实际字符，而不是字符的UTF-8编码，所以我认为这是不正确的，并且可能是库中或我的输入中的错误，对吧？

问题是，我如何（a）认识到我的unicode字符串中有UTF-8编码的文本，并且（b）将其转换为适当的unicode字符串？

我难倒（一），因为没有什么错，编码，明智的，有关原始的字符串（也即，二者在自己的权利是无效字符， u'xc3xb3' == A 3，但他们没有应该在那里）

它看起来像我可以实现（b）通过eval（）输出repr（）输出减去前面的“u”来获得一个str，然后用UTF-8解码str：

>>> eval(repr(title)[1:]).decode("utf-8")
u'Sopetxf3n'
>>> print eval(repr(title)[1:]).decode("utf-8")
Sopetón

但是这似乎有点笨拙。是否有一种官方认可的方式从unicode字符串中获取原始数据并将其视为常规字符串？

a）尝试通过下面的方法。

b）

>>> u'Sopetxc3xb3n'.encode('latin-1').decode('utf-8')
u'Sopetxf3n'

你应该使用：

title.encode（ 'raw_unicode_escape'）

Python2：

print(u'xd0xbfxd1x80xd0xb8'.encode('raw_unicode_escape'))

Python3：

print(u'xd0xbfxd1x80xd0xb8'.encode('raw_unicode_escape').decode('utf8'))

链接地址: http://www.djcxy.com/p/87843.html