0

I am trying to clean conversational text from a StackExchange corpus which contains sentences which may have Latex expressions inside. Latex expressions are delimited by the $ sign: For instance $y = ax + b$

Here is a line of example text from the data containing multiple Latex expressions:

@Gruber - this is another example, when applied like so: $\mathrm{Var} \left(X^2\right) = 4 X^2 \mathrm{Var} (X)$ doesn't make any sense, on the left side you have a constant and on the right a random variable. Did you mean $4E(X)^2 Var(X)$ bless those that take the road less travelled. Another exception in your theory is $4E(X)^2 Var(X)$. What were you thinking? :)

Here is what I have so far: It seems to clobber text between each Latex Expression match and gives one huge match which is incorrect.

([\$](.*)[\$]){1,3}?
2
  • Try to replace the . by [^$] Commented Mar 12, 2020 at 3:30
  • Unfortunately it doesn't seem to work, tried with [^$] and [^\$] Commented Mar 12, 2020 at 3:44

1 Answer 1

2

I don't understand why you put {1,3} at the end, what goal did you try to achieve. Anyway, your mistake is that you use [\$], which gives you a set of two characters - a backslash and a dollar. I suggest you use

\$([^$]*)\$

and replace it with an empty string: demo here

Sign up to request clarification or add additional context in comments.

1 Comment

Hey Alex, perfect answer thank you. I wanted to try the Ungreedy evaluation hence experimenting with the {1,3}. I appreciate your helpp very much, thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.