Python Regular Expression Clobbering Text Between Multiple Latex Expression Matches

Question

I am trying to clean conversational text from a StackExchange corpus which contains sentences which may have Latex expressions inside. Latex expressions are delimited by the $ sign: For instance $y = ax + b$

Here is a line of example text from the data containing multiple Latex expressions:

@Gruber - this is another example, when applied like so: $\mathrm{Var} \left(X^2\right) = 4 X^2 \mathrm{Var} (X)$ doesn't make any sense, on the left side you have a constant and on the right a random variable. Did you mean $4E(X)^2 Var(X)$ bless those that take the road less travelled. Another exception in your theory is $4E(X)^2 Var(X)$. What were you thinking? :)

Here is what I have so far: It seems to clobber text between each Latex Expression match and gives one huge match which is incorrect.

([\$](.*)[\$]){1,3}?

Unfortunately it doesn't seem to work, tried with [^$] and [^\$] — Matt G
– Matt G, Commented Mar 12, 2020 at 3:44

Alex Sveshnikov · Accepted Answer · 2020-03-12 06:23:07Z

2

I don't understand why you put {1,3} at the end, what goal did you try to achieve. Anyway, your mistake is that you use [\$], which gives you a set of two characters - a backslash and a dollar. I suggest you use

\$([^$]*)\$

and replace it with an empty string: demo here

answered Mar 12, 2020 at 6:23

Alex Sveshnikov

4,3391 gold badge13 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Matt G Over a year ago

Hey Alex, perfect answer thank you. I wanted to try the Ungreedy evaluation hence experimenting with the {1,3}. I appreciate your helpp very much, thanks

Collectives™ on Stack Overflow

Python Regular Expression Clobbering Text Between Multiple Latex Expression Matches

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related