I have a list of Unicode character codes I need to convert into chars on python 2.7.
U+0021
U+0022
U+0023
.......
U+0024
How to do that?
This regular expression will replace all U+nnnn sequences with the corresponding Unicode character:
import re
s = u'''\
U+0021
U+0022
U+0023
.......
U+0024
'''
s = re.sub(ur'U\+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)
print(s)
Output:
!
"
#
.......
$
Explanation:
unichr gives the character of a codepoint, e.g. unichr(0x21) == u'!'.int('0021',16) converts a hexadecimal string to an integer.lambda(m): expression is an anonymous function that receives the regex match.def func(m): return expression but inline.re.sub matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern is U+hhhh where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.In case anyone using Python 3 and above wonders, how to do this effectively, I'll leave this post here for reference, since I didn't realize the author was asking about Python 2.7...
Just use the built-in python function chr():
char = chr(0x2474)
print(char)
Output:
⑴
Remember that the four digits in Unicode codenames U+WXYZ stand for a hexadecimal number WXYZ, which in python should be written as 0xWXYZ.
a = 'U+aaa'
a.encode('ascii','ignore')
'aaa'
This will convert for unicode to Ascii which i think is what you want.
str in python 2 but return a byte in python3