0

I have the following content :

<div class="TEST-TEXT">hi</span>
<a href=\"https://en.wikipedia.org/wiki/TEST-TEXT\">first young CEO's TEST-TEXT</a>
<span class="test">hello</span>

I am trying to match the TEST-TEXT string to replace it is value but only when it is a text and not within an attribute value.

I have checked the concepts of look-ahead and look-behind in Regex but the current issue with that is that it needs to use a fixed width for the match here is a link regex-match-all-characters-between-two-html-tags that show case a very similar case but with an exception that there is a span with a class to create a match also checked the link regex-match-attribute-in-a-html-code

here are two regular expressions I am trying with :

  1. \"([^"]*)\"
  2. (?s)(?<=<([^{]*)>)(.+?)(?=</.>)

both are not working for me try using [https://regex101.com/r/ApbUEW/2]

I expect it to match only the string when it is a text current behavior it matches both cases

Edit : I want the text to be dynamic and not specific to TEST-TEXT

7
  • what is expected output? Commented May 20, 2019 at 12:11
  • 4
    Regex isn't powerful enough to parse HTML stackoverflow.com/questions/590747/… Commented May 20, 2019 at 12:11
  • @TheScientificMethod to match the third TEST-TEXT which is when it is a inner text of two tags Commented May 20, 2019 at 12:17
  • @PushpeshKumarRajwanshi Kumar Rajwanshi what else would you suggest to use ? Commented May 20, 2019 at 12:30
  • try: TEST-TEXT(?=<\/a>) Commented May 20, 2019 at 12:32

5 Answers 5

1

A RegEx for that a string between any two HTML tags

(?![^<>]*>)(TEST\-TEXT)

Here, assuming that you have valid HTML, the negative lookahead is making sure that we are not inside the tag definition where all its attributes are defined. It does that by ensuring that the next angle bracket that appear is not > which would indicate that we are inside the tag definition.

Note that the following regex would also achieve the same outcome:

(?=[^<>]*<)(TEST\-TEXT)

Or even

(TEST\-TEXT)(?=[^<>]*<)
Sign up to request clarification or add additional context in comments.

Comments

0

Something like this should help:

\>([^"<]*)\<

EDIT:

Without open and close tags included:

(?<=\>)([^"<]*)(?=\<)

2 Comments

that would also include the opining and closing tag I want it to only match the string inside.
that would match any text in between tags, I want it to match a specific string
0

Try TEST-TEXT(?=<\/a>)

TEST-TEXT matches TEST-TEXT

?= look ahead to check closing tag </a>

see at regex101

3 Comments

TEST-TEXT(?=<\/.*>) that would match all tags but I still want the text to be dynamic
to make it dynamic, use TEST-TEXT(?=<\.*>)
that wont match anything
0

Here, we might just add a soft boundary on the right of the desired output, which you have been already doing, then a char list for the desired output, then collect, after that we can make a replacement by using capturing groups (). Maybe similar to this:

([A-Z-]+)(<\/)

enter image description here

Demo

This snippet is just to show that the expression might be valid:

const regex = /([A-Z-]+)(<\/)/gm;
const str = `<div class="TEST-TEXT">hi</span><a href=\\"https://en.wikipedia.org/wiki/TEST-TEXT\\">first young CEO's
TEST-TEXT</a><span class="test">hello</span><div class="TEST-TEXT">hi</span><a href=\\"https://en.wikipedia.org/wiki/TEST-TEXT\\">first young CEO's
TEST-TEXT</a><span class="test">hello</span>`;
const subst = `NEW-TEXT$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com.

RegEx Circuit

jex.im also helps to visualize the expressions.

enter image description here

1 Comment

that will only work when there is a closing tag after the sting what I am looking for is a match for the string in all cases when it is a text and not an attribute value.
0

Maybe this will help?

      String html = "<div class=\"TEST-TEXT\">hi</span>\n" +
            "<a href=\\\"https://en.wikipedia.org/wiki/TEST-TEXT\\\">first young CEO's TEST-TEXT</a>\n" +
            "<span class=\"test\">hello</span>";

    Pattern pattern = Pattern.compile("(<)(.*)(>)(.*)(TEST-TEXT)(.*)</.*>");
    Matcher matcher = pattern.matcher(html);
    while (matcher.find()){
        System.out.println(matcher.group(5));
    }

1 Comment

that will only match it if it is the only string between the two tags.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.