1

I'm trying to remove the following string/line in my SQL database:

<p><span style="font-size:16px"><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>
  1. String will always start with <p> and end with </p>
  2. String will always contain these words, in the same order: The, quick, brown. But they might be separated by something else (space, &nbsp; or other HTML tags)
  3. String is part of field with more text, nested HTML tags, so the solution must ignore higher level <p></p> tags.
  4. We are talking about +20k matches, no manual edits solutions please :)

I have already tried doing it with RegExp but I can't filter for multiple keywords (AND operator).

I can export my DB to a sql file so I can use any solution you would recommend, Windows/Linux, text editor, js script etc. but I would appreciate the simplest and elegant solution.

9
  • Which SQL database? Are The, quick and brown always in same order? Do you want to remove the record, or just delete that text from the text of the field? Commented May 15, 2015 at 9:38
  • @Amadan that is MySQL Commented May 15, 2015 at 9:39
  • Can you show us how you used Regexp ? Commented May 15, 2015 at 9:42
  • @Elyasin /<p.+The.+quick.+brown.+\/p>/; But since there are nested html tags, this will capture highest level <p></p> thus removing more than necessary. Commented May 15, 2015 at 9:46
  • @FlorinC. Clarify this in your question. You are getting wrong answers this way. Commented May 15, 2015 at 9:49

3 Answers 3

1

I think you have to restrict .* by a non-efficient but more precise (?:(?!<\/?p[^<]*>).)* that will force to match the words inside 1 <p> tag:

(?i)<p>(?:(?!<\/?p[^<]*>).)*the(?:(?!<\/?p[^<]*>).)*?quick(?:(?!<\/?p[^<]*>).)*?brown(?:(?!<\/?p[^<]*>).)*?<\/p>

See demo

Sign up to request clarification or add additional context in comments.

Comments

0

This expression ^<p>.*The.*quick.*brown.*</p>\$ worked for me:

 [root@fedora ~]# grep "^<p>.*The.*quick.*brown.*</p>\$" test1.txt
<p><span style="font-size:16px"><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>
<p><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>
<p>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</p>
[root@fedora ~]#

1 Comment

Unfortunately string is part of more text with nested HTML tags, your solution just matches my entire field text up to higher <p></p> tags.
0

You can use the following in any editor (say notepad++) or javascript or any PCRE engine with g, m, i modifiers to match:

^<p>.*?the.*?quick.*?brown.*?<\/p>$

Used .* instead of .+ because of your statement they MIGHT be separated by something else

and replace with '' (empty string)

1 Comment

Unfortunately string is part of more text with nested HTML tags, your solution just matches my entire field text up to higher <p></p> tags.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.