0

I have a string as follows:

str1 = "heylisten\uff08there is something\uff09to say \uffa9"

I need to replace the unicode values detected by my regex expression with spaces on either sides.

Desired output string:

out = "heylisten \uff08 there is something \uff09 to say  \uffa9 "

I have used an re.findall to get all the matches and then replace them. It looks like:

p1 = re.findall(r'\uff[0-9a-e][0-9]', str1, flags = re.U)  
out = str1
for item in p1:
    print item
    print out
    out= re.sub(item, r" " + item + r" ", out) 

And this outputs:

'heylisten\\ uff08 there is something\\ uff09 to say \\ uffa9 ' 

What is wrong with the above that it prints an extra "\" and also separates it from uff? I even tried with re.search but it seems to only separate \uff08. Is there a better way?

2
  • But it doesn't seems you replaced any thing ? !! Commented Nov 5, 2014 at 8:57
  • I didn't get you . I want spaces on either sides on each match. But the \ seem to separate. Commented Nov 5, 2014 at 8:59

2 Answers 2

1

I have a string as follows:

str1 = "heylisten\uff08there is something\uff09to say \uffa9"

I need to replace the unicode values ...

You don't have any unicode values. You have a bytestring.

str1 = u"heylisten\uff08there is something\uff09to say \uffa9"
 ...
p1 = re.sub(ur'([\uff00-\uffe9])', r' \1 ', str1)
Sign up to request clarification or add additional context in comments.

2 Comments

it isn't working.. outputs 'heylisten\\uff08there is something\\uff09to say \\uffa9'
yeah i read it and I guess I framed the example wrong.. it works with the u outside..
1
print re.sub(r"(\\uff[0-9a-e][0-9])", r" \1 ", x)

You can directly use this re.sub. See demo.

http://regex101.com/r/sU3fA2/67

import re
p = re.compile(ur'(\\uff[0-9a-e][0-9])', re.UNICODE)
test_str = u"heylisten\uff08there is something\uff09to say \uffa9"
subst = u" \1 "

result = re.sub(p, subst, test_str)

Output:

heylisten \uff08 there is something \uff09 to say  \uffa9

2 Comments

your import re code ouputs this : u'heylisten\uff08there is something\uff09to say \uffa9'
@Swordy directly use print re.sub(r"(\\uff[0-9a-e][0-9])", r" \1 ", x) x is uur string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.