Okay, usually I don't ask these sort of questions.
Using re.sub to find and replace normal strings is straightforward, but how do regular expressions in the replacement part (rather than the matching part) work?
In particular, in reference to Brian Okken's web-page which purports to explain exactly this, providing code to replicate the same sort of functionality that he was used to in Pearl, but had struggled to develop in python.
import fileinput
import re
for line in fileinput.input():
line = re.sub(r'\* \[(.*)\]\(#(.*)\)', r'<h2 id="\2">\1</h2>', line.rstrip())
print(line)
This sub is meant to match
* [the label](#the_anchor)
and replace it with
<h2 id="the_anchor">the label</h2>
It works: but how does the script know exactly what the label and anchor are? Presumably \1 and \2 are meant to match the desired text, but how does the script know this and not think, perhaps, that the leading * refers to \1?
\1in the substitution refers to whatever had matched the first pair of parens (i.e. the first(.*)) in the regex.\1,\2are the first and second matched group from the pattern to be replaced. Groups are the parts of the pattern in parentheses.\(GroupReference)is meant to reference groups that were in the matching text. If you do not know what groups are, I suggest looking into those. In this case,\1and\2are references to groups 1 and 2, in other words the things inside of the first and second pair of()brackets, respectively.