Don't reinvent the wheel here. Python has your back. Besides, handling escape syntax correctly, is harder than it looks.
The correct way to handle this
In Python 2, use the str-to-str string_escape codec:
string.decode('string_escape')
This interprets any Python-recognized string escape sequences for you, including \n and \t.
Demo:
>>> string = '\\n this is a docstring for\\n the main function.\\n a,\\n b,\\n c\\n '
>>> string.decode('string_escape')
'\n this is a docstring for\n the main function.\n a,\n b,\n c\n '
>>> print string.decode('string_escape')
this is a docstring for
the main function.
a,
b,
c
>>> '\\t\\n\\r\\xa0\\040'.decode('string_escape')
'\t\n\r\xa0 '
In Python 3, you'd have to use the codecs.decode() and the unicode_escape codec:
codecs.decode(string, 'unicode_escape')
as there is no str.decode() method and this is not a str -> bytes conversion.
Demo:
>>> import codecs
>>> string = '\\n this is a docstring for\\n the main function.\\n a,\\n b,\\n c\\n '
>>> codecs.decode(string, 'unicode_escape')
'\n this is a docstring for\n the main function.\n a,\n b,\n c\n '
>>> print(codecs.decode(string, 'unicode_escape'))
this is a docstring for
the main function.
a,
b,
c
>>> codecs.decode('\\t\\n\\r\\xa0\\040', 'unicode_escape')
'\t\n\r\xa0 '
Why straightforward str.replace() won't cut it
You could try to do this yourself with str.replace(), but then you also need to implement proper escape parsing; take \\\\n for example; this is \\n, escaped. If you naively apply str.replace() in sequence, you end up with \n or \\\n instead:
>>> '\\\\n'.decode('string_escape')
'\\n'
>>> '\\\\n'.replace('\\n', '\n').replace('\\\\', '\\')
'\\\n'
>>> '\\\\n'.replace('\\\\', '\\').replace('\\n', '\n')
'\n'
The \\ pair should be replaced by just one \ characters, leaving the n uninterpreted. But the replace option either will end up replacing the trailing \ together with the n with a newline character, or you end up with \\ replaced by \, and then the \ and the n are replaced by a newline. Either way, you end up with the wrong output.
The slow way to handle this, manually
You'll have to process the characters one by one instead, pulling in more characters as needed:
_map = {
'\\\\': '\\',
"\\'": "'",
'\\"': '"',
'\\a': '\a',
'\\b': '\b',
'\\f': '\f',
'\\n': '\n',
'\\r': '\r',
'\\t': '\t',
}
def unescape_string(s):
output = []
i = 0
while i < len(s):
c = s[i]
i += 1
if c != '\\':
output.append(c)
continue
c += s[i]
i += 1
if c in _map:
output.append(_map[c])
continue
if c == '\\x' and i < len(s) - 2: # hex escape
point = int(s[i] + s[i + 1], 16)
i += 2
output.append(chr(point))
continue
if c == '\\0': # octal escape
while len(c) < 4 and i < len(s) and s[i].isdigit():
c += s[i]
i += 1
point = int(c[1:], 8)
output.append(chr(point))
return ''.join(output)
This now can handle the \xhh and the standard 1-letter escapes, but not the \0.. octal escape sequences, or \uhhhh Unicode code points, or \N{name} unicode name references, nor does it handle malformed escapes in quite the same way as Python would.
But it does handle the escaped escape properly:
>>> unescape_string(string)
'\n this is a docstring for\n the main function.\n a,\n b,\n c\n '
>>> unescape_string('\\\\n')
'\\n'
Do know this is far slower than using the built-in codec.
str.replace.'\\n'? Or is it appearing in some escaped form?