1

I am reading in a file and trying to replace every occurrence of a regex match with that match but with the white space stripped. For example, the regex which matches correctly on what I want in my document is '([0-9]+\s(st|nd|rd|th))' so that anything inside of the document of the form...

1 st, 2 nd, 33 rd, 134 th etc. will be matched.

What I want is to simply write a new file with each of those occurrences in the original file replaced with the white space removed.

I have played with a few things like re.findall and re.sub but I cant quite figure out how to write the full document but with just the substring matches replaced without white space.

Thanks for the help.

1
  • What does the etc means? Commented Jul 8, 2014 at 14:21

3 Answers 3

2

If I understand correctly, you could use re.sub to achieve this.

Instead of placing a capturing group around your entire pattern, place one around the numbers and another around the selected text, omitting whitespace.

>>> import re
>>> text = 'foo bar 1 st, 2 nd, 33 rd, 134 th baz quz'
>>> re.sub(r'([0-9]+)\s+(st|nd|rd|th)\b', '\\1\\2', text)

Another way would be to use lookarounds.

>>> re.sub(r'(?<=[0-9])\s+(?=(?:st|nd|rd|th)\b)', '', text)

Output

foo bar 1st, 2nd, 33rd, 134th baz quz
Sign up to request clarification or add additional context in comments.

Comments

2

replaced with the white space removed.

Try using Non-capturing group.

(?:\d+)\s+(?:(st|nd|rd|th))

Online demo

The above regex will capture for spaces between digits followed by any one of st,nd,rd,th. Now simply replace all the spaces with an empty string.

Comments

1

Another trick without capturing groups. You need to add the word boundary in your regex to match only the spaces between the digits and the st or nd or ed or th strings. In the replacement part, matched spaces are replaced with a null string(ie, matched spaces are removed through re.sub)

>>> import re
>>> text = 'foo 1 st, 2 nd, 33 rddfa,33 rd,bar 134 th'
>>> re.sub(r'(?<=\d)\s+(?=(?:st|nd|rd|th)\b)', r'', text)
'foo 1st, 2nd, 33 rddfa,33rd,bar 134th'

DEMO

1 Comment

why you used both (?=(?:))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.