0

I am trying to replace whitespaces, in latex that is contained in a markdown document, with \\; using regex.
In the md package I'm using, all latex is wrapped in either $ or $$

I would like to change the following from

"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"

to this

"dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$"

I have managed to do it using the messy function below but would like to use regex for a cleaner approach. Any help would be appreciated

import re
vals = r"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"
def cleanlatex(vals):
    vals = vals.replace(" ", "  ")
    char1 = r"\$\$"
    char2 = r"\$"
    indices = [i.start() for i in re.finditer(char1, vals)]
    indices += [i.start() for i in re.finditer(char2, vals.replace("$$","~~"))]

    indices.sort()
    print(indices)
    # check that no of $ or $$ are even
    if len(indices) % 2 == 0:
        while indices:
            start = indices.pop(0)
            finish = indices.pop(0)
            vals = vals[:start] + vals[start:finish].replace('  ', '\;') + vals[finish:]
    
    vals = vals.replace("  ", " ")
    return vals

print(cleanlatex(vals))

Output:

[18, 39, 60, 78]   
dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$

2 Answers 2

2

With regex I would still do it in two steps:

  • Identify the parts between dollars (or double dollars) using regex
  • Within those parts, replace spaces with a simple replace call
def cleanlatex(vals):
    return re.sub(r"(\$\$?)(.*?)\1", lambda m: m[0].replace(" ", r"\;"), vals)  

If the dollars don't match up, this will still make replacements, up until no more pair of matching dollars is found. This is a different behaviour from how your code works where nothing is replaced when the dollars don't match.

When dollars are "nested", like in "$$nested $ here$$", then the inner dollar will not be regarded as a delimiter in this solution. Or if a double dollar happens to follow a single dollar, the double one will be interpreted as two single dollars that just happen to follow each other. So "$part one$$part two$" will identify two parts, each delimited with a single dollar.

Your question didn't give any such boundary conditions (there are quite a few of them), so the solution may need some adaptations.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This is better than expected!
0

I never thought of lambda! Thank you @trincot your answer covers things I didn't even know were possible with regex. I am trying to decipher the pattern and would love some clarification if you can? I'd really appreciate it as I've had a look at re docs but am still confused by the following

  1. is there a reason to use ($$?) over ($+)?
  2. \1 -> is this just a way to keep the pattern tidy and if I used \2 it would replicate the second capture group?
  3. does the ? in (.*?) make it find the shortest string that matches pattern?
  4. Why m[0] ie why index at 0

Thanks again for the reply

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.