0

I have a simple piece of HTML code.

<tr>
OtherElement
</tr>
<tr>
HelloWorld
</tr>

I need to match the <tr></tr> element containing HelloWorld. I am using this regular expression but it matches first element as well.

<tr[\s\S]*?HelloWorld[\s\S]*?<\/tr>

I am using Node.js so I can not use look behind.

2
  • 3
    Obligatory: stackoverflow.com/a/1732454/616443 Commented Jan 8, 2016 at 16:54
  • 1
    Do you really need (melius abundare quam deficere) to parse broken HTML with regexes? Oh and where multiple lines on child elements are?! Commented Jan 8, 2016 at 16:56

3 Answers 3

1

I assume you receive the HTML fragment as a string. So, you need to parse it with DOM parser (after replacing all tr tags with another custom name since otherwise parsing will fail) and get only those tr elements that contain (not are equal to) the string HelloWorld.

var $txt = "<tr>\nOtherElement\n</tr>\n<tr>Initial text\nHelloWorld\nSome other text</tr>";
var $el = document.createElement( 'body' );
$el.innerHTML = $txt.replace(/<(\/?)tr\b([^<]*)>/g, "<$1tablerows$2>"); // normalize TR tags as tablerows tags
var $arr = [];
[].forEach.call($el.getElementsByTagName("tablerows"), function(v,i,a) {
    if (v.innerText.indexOf("HelloWorld") > -1) {
		$arr.push(v.innerText);
    }
});
document.write(JSON.stringify($arr, 0, 4));

A regex solution is nasty and fragile, but possible:

<tr\b[^<]*>[^<]*(?:<(?!tr\b)[^<]*)*HelloWorld[^<]*(?:<(?!\/tr>)[^<]*)*<\/tr>

See regex demo

The regex uses an unroll the loop technique to match the closest subpatterns.

  • <tr\b[^<]*> - matches an opening TR tag
  • [^<]*(?:<(?!tr\b)[^<]*)* - matches anything but <tr up to the
  • HelloWorld - literal sequence
  • [^<]*(?:<(?!\/tr>)[^<]*)* - all but closing </tr>
  • <\/tr> - closing TR tag
Sign up to request clarification or add additional context in comments.

2 Comments

I am using node.js so I can not use DOM parser, but your regex solution works like a charm. Thanks
Not sure if that answer is still relevant, but it says you can use the npm modules jsdom and htmlparser to create and parse a DOM in Node.JS.
1

Don't parse HTML with regexps. Instead, use DOM routines and properties:

function find_hello_world() {
  var trs = document.querySelectorAll('tr');

  for (var i=0; i<trs.length; i++) 
    if (trs[i].textContent === "HelloWorld") return trs[i];

}

1 Comment

I can not use DOM since I am not in the browser but in Node.js environment.
1

There's an error in your regular expression. This character set is too permissive: [\s\S]*?

Try the following:

<tr>\s*HelloWorld\s*<\/tr>

\s* means 0 or more whitespace characters and nothing else.

And you may want to examine why you're using RegEx to parse HTML. This can be a useful approach for working with string snippets of known HTML, such as from a database, but in JavaScript you're probably better off using an XML parser or the DOM query selector methods.

2 Comments

How is [\s] different from \s?
@torazaburo it's not... That's what I get for modifying somebody else's RegEx instead of starting from scratch! Thanks for the correction, I've edited my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.