0

I am looking for a pattern that catches and substitutes:

"whatever whatever 1. [document 1] This is a document dealing with"

"whatever whatever 1. This is a document dealing with"

but of course only in the case where both numbers are the same

in general:

"whatever whatever N. [document N] This is a document dealing with"

if it helps N has to be between 1 and 1000 (i.e. max three characters)

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )(*****)',r'\1\2\3\4',mystr)
                 ^^^^^^^^                            ^^^^^^

I have to refer in ***** to the first group

I could use:

mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )([1-9]+)',r'\1\2\3\4',mystr)

but of course that will inlcude cases like: "whatever whatever 56. [document 877] This is a document dealing with"

I check a bunch of answers with no success: Regex: How to match a string that contains repeated pattern? Capture repeated groups in python regex Capturing repeating subpatterns in Python regex Regex with repeating groups python regular expression repeating group matches

1 Answer 1

2

You can use a group and a backreference to the number:

As I am not sure of your full condition to match, I am providing a minimal example here assuming the only match is a number with up to 3 digits followed by a reference of the form [document {number}]:

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'((\d{1,3})\.)\s*\[document \2\]', r'\1', mystr)

output: 'whatever whatever 1. This is a document dealing with'

NB. In the example above the reference to consider is \2, you will have to update this carefully if you are using more capturing groups

Sign up to request clarification or add additional context in comments.

3 Comments

To match from a digit 1-999 \b(([1-9][0-9]{0,2})\.)\s+\[[Dd]ocument \2]
@Thefourthbird thanks, but I was only providing a minimal example, the important point being IMO the backreference ;) (actually I hesitated just to put \d+)
Yes ([1-9]\d{0,2}) would also work of course

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.