-2

How to decode unicode string like this:

what%2527s%2bthe%2btime%252c%2bnow%253f

into ascii like this:

what's+the+time+now

3
  • stackoverflow.com/questions/275174/… Commented Sep 23, 2011 at 14:35
  • The string you start with is not in unicode. Commented Sep 23, 2011 at 15:28
  • "ascii" vs "unicode" is a completely different issue from the one you're having. It could hardly be more different, really. Commented Sep 23, 2011 at 16:09

3 Answers 3

6

in your case, the string was decoded twice, so we need unquote twice to get it back

In [1]: import urllib
In [2]: urllib.unquote(urllib.unquote("what%2527s%2bthe%2btime%252c%2bnow%253f") )
Out[3]: "what's+the+time,+now?"
Sign up to request clarification or add additional context in comments.

1 Comment

At least the outer unquote probably wants to be unquote_plus instead; I'm guessing those +s were originally spaces, submitted as an HTML form (which has a slightly different handling of + than regular URL-encoding). But, yeah, the double-encoded string is a red flag for “someone's done something wrong here...”
0

Something like this?

title = u"what%2527s%2bthe%2btime%252c%2bnow%253f"
print title.encode('ascii','ignore')

Also, take a look at this

Comments

0

You could convert the %(hex) escaped chars with something like this:

import re

def my_decode(s):
    re.sub('%([0-9a-fA-F]{2,4})', lambda x: unichr(int(x.group(1), 16)), s)

s = u'what%2527s%2bthe%2btime%252c%2bnow%253f'
print my_decode(s)

results in the unicode string

u'what\u2527s+the+time\u252c+now\u253f'

Not sure how you'd know to convert \u2527 to a single quote, or drop the \u253f and \u252c chars when converting to ascii

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.