1

I am trying to split a string in the following manner. Here is a sample strings:

"Hello this is a string.-2.34 This is an example1 string."

Please note that "" is a U+F8FF unicode character and the type of the string is Unicode.

I want to break the string as:

"Hello this is a string.","-2.34"," This is an example1 string."

I have written a regex to split the string but using this I cannot get the numeric part that I want. (-2.34 in first string)

My code:

import re
import os
from django.utils.encoding import smart_str, smart_unicode

text = open(r"C:\data.txt").read()
text = text.decode('utf-8')
print(smart_str(text))

pat = re.compile(u"\uf8ff-*\d+\.*\d+")
newpart = pat.split(text)
firstpart = newpart[::1]

print ("first part of the string ----")
for f in firstpart:
f = smart_str(f)
print ("-----")
print f

1 Answer 1

5

You need to put parentheses around -*\d+\.*\d+ if you want to keep it in the result of re.split:

import re
text = u"Hello this is a string.\uf8ff-2.34 This is an example1 string."
print(re.split(u'\uf8ff(-*\d+\.*\d+)', text))

yields

[u'Hello this is a string.', u'-2.34', u' This is an example1 string.']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.