How to decode unicode string like this:
what%2527s%2bthe%2btime%252c%2bnow%253f
into ascii like this:
what's+the+time+now
How to decode unicode string like this:
what%2527s%2bthe%2btime%252c%2bnow%253f
into ascii like this:
what's+the+time+now
in your case, the string was decoded twice, so we need unquote twice to get it back
In [1]: import urllib
In [2]: urllib.unquote(urllib.unquote("what%2527s%2bthe%2btime%252c%2bnow%253f") )
Out[3]: "what's+the+time,+now?"
unquote probably wants to be unquote_plus instead; I'm guessing those +s were originally spaces, submitted as an HTML form (which has a slightly different handling of + than regular URL-encoding). But, yeah, the double-encoded string is a red flag for “someone's done something wrong here...”Something like this?
title = u"what%2527s%2bthe%2btime%252c%2bnow%253f"
print title.encode('ascii','ignore')
Also, take a look at this
You could convert the %(hex) escaped chars with something like this:
import re
def my_decode(s):
re.sub('%([0-9a-fA-F]{2,4})', lambda x: unichr(int(x.group(1), 16)), s)
s = u'what%2527s%2bthe%2btime%252c%2bnow%253f'
print my_decode(s)
results in the unicode string
u'what\u2527s+the+time\u252c+now\u253f'
Not sure how you'd know to convert \u2527 to a single quote, or drop the \u253f and \u252c chars when converting to ascii