I have a list of hex that I would like to transform into a list of unicode characters. Everything here is done with python-3.5.
If I do print(binary.fromhex('hex_number').decode('utf-8')) it works. But does not work if, after the conversion, I store, again, the chars in the list:
a = ['0063'] # Which is the hex equivalent to the c char.
b = [binary.fromhex(_).decode('utf-8') for _ in a]
print(b)
will print
['\x00c']
instead of
['c']
while the code
a = ['0063']
for _ in a:
print(binary.fromhex(_).decode('utf-8'))
prints, has expected:
c
Can someone explain to me how I can convert the list ['0063'] in the list ['c'] and why I get this strange (to me) behavior?
To see what the 0063 hex corresponds look here.
0063, decoded as UTF-8, ever produce'c'? And why would030Cmap to a space (which encodes to20in UTF-8 hex)?0063in hex corresponds to the 'c' in utf-8 (would be U+0063). This is easy to see if you just use the code above. The030Ccorresponds to the COMBINING CARON, as you said. As I said in the question, this is shown as a space in my shell (probably because my shell is not able to map it to something). Honestly, I do not understand what is wrong with my question. I did not put much attention to the COMBINING CARON just because it was not really important to answer the question. But if you think, I can write something different that can be easily mapped by my shell.63in UTF-8, while U+030C COMBINING CARON isCC8C. Unicode codepoints != UTF-8. Perhaps you are thinking of UTF-16 (big endian order) instead?