Python – How do convert unicode escape sequences to unicode characters in a python string


When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\xf6ld". I want the escape sequence to be returned as string. How to do it in python?

Best Solution

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'

Another way of achieving this:

>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'

Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

>>> print name.decode('latin-1')
Christensen Sk├Âld

BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'