Replace spaces between specific characters only using regex

Question

I am trying to replace whitespaces, in latex that is contained in a markdown document, with \\; using regex.
In the md package I'm using, all latex is wrapped in either $ or $$

I would like to change the following from

"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"

to this

"dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$"

I have managed to do it using the messy function below but would like to use regex for a cleaner approach. Any help would be appreciated

import re
vals = r"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"
def cleanlatex(vals):
    vals = vals.replace(" ", "  ")
    char1 = r"\$\$"
    char2 = r"\$"
    indices = [i.start() for i in re.finditer(char1, vals)]
    indices += [i.start() for i in re.finditer(char2, vals.replace("$$","~~"))]

    indices.sort()
    print(indices)
    # check that no of $ or $$ are even
    if len(indices) % 2 == 0:
        while indices:
            start = indices.pop(0)
            finish = indices.pop(0)
            vals = vals[:start] + vals[start:finish].replace('  ', '\;') + vals[finish:]
    
    vals = vals.replace("  ", " ")
    return vals

print(cleanlatex(vals))

Output:

[18, 39, 60, 78]   
dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$

trincot · Accepted Answer · 2022-08-18 06:08:25Z

2

With regex I would still do it in two steps:

Identify the parts between dollars (or double dollars) using regex
Within those parts, replace spaces with a simple replace call

def cleanlatex(vals):
    return re.sub(r"(\$\$?)(.*?)\1", lambda m: m[0].replace(" ", r"\;"), vals)

If the dollars don't match up, this will still make replacements, up until no more pair of matching dollars is found. This is a different behaviour from how your code works where nothing is replaced when the dollars don't match.

When dollars are "nested", like in "$$nested $ here$$", then the inner dollar will not be regarded as a delimiter in this solution. Or if a double dollar happens to follow a single dollar, the double one will be interpreted as two single dollars that just happen to follow each other. So "$part one$$part two$" will identify two parts, each delimited with a single dollar.

Your question didn't give any such boundary conditions (there are quite a few of them), so the solution may need some adaptations.

edited Aug 18, 2022 at 6:08

answered Aug 18, 2022 at 6:02

trincot

357k38 gold badges282 silver badges339 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dara O h Over a year ago

Thank you! This is better than expected!

Dara O h · Accepted Answer · 2022-08-19 06:38:30Z

0

I never thought of lambda! Thank you @trincot your answer covers things I didn't even know were possible with regex. I am trying to decipher the pattern and would love some clarification if you can? I'd really appreciate it as I've had a look at re docs but am still confused by the following

is there a reason to use ($$?) over ($+)?
\1 -> is this just a way to keep the pattern tidy and if I used \2 it would replicate the second capture group?
does the ? in (.*?) make it find the shortest string that matches pattern?
Why m[0] ie why index at 0

Thanks again for the reply

answered Aug 19, 2022 at 6:38

Dara O h

1479 bronze badges

Collectives™ on Stack Overflow

Replace spaces between specific characters only using regex

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related