I have this:
>>> su = u'"/\"'
In python, how can I convert this to a representation that shows the unicode code points? That would be this for the string above
u'\u0022\u002F\u005C\u0022'
Your original string is not four characters but three because \" is an escape code for a double quote:
>>> su = u'"/\"'
>>> len(su)
3
Here's how to display it as escape codes:
>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u0022'
Use a Unicode raw string, or double backslashes to escape the slash and get four characters:
>>> su = ur'"/\"' # Raw version
>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u005C\\u0022'
>>> su = u'"/\\"' # Escaped version
>>> ''.join(u'\\u{:04X}'.format(ord(c)) for c in su)
u'\\u0022\\u002F\\u005C\\u0022'
Note the double backslash in the result. This indicates it is a single literal backslash. with one backslash, they would be escape codes...no different from your original string:
>>> ur'"/\"' == u'\u0022\u002F\u005C\u0022'
True
Printing it shows the content of the strings:
>>> print u'\u0022\u002F\u005C\u0022'
"/\"
>>> print(''.join(u'\\u{:04X}'.format(ord(c)) for c in su))
\u0022\u002F\u005C\u0022
" in the first place. Still a bit of a confusing example, however."/" or "/\". Putting a \ character in an example string, especially before a character that can be escaped (such as ") without clarifying intention results in this confusion.To support the full Unicode range, you could use unicode-escape to get the text representation. To represent characters in the ascii range as the unicode escapes too and to force \u00xx representation even for u'\xff', you could use a regex:
#!/usr/bin/env python2
import re
su = u'"/"\U000af600'
assert u'\ud800' not in su # no lone surrogate
print re.sub(ur'[\x00-\xff]', lambda m: u"\ud800u%04x" % ord(m.group()), su,
flags=re.U).encode('unicode-escape').replace('\\ud800', '\\')
a lone surrogate (U+d800) is used to avoid escaping the backslash twice.
\u0022\u002f\u0022\U000af600