js Regex not working as expected. Newline not getting detected [duplicate]

Question

I have a string as follows:

<abc name = "foo">
  <child>bar</child>
</abc>
<xyz>1</xyz>

<abc name = "foo2">
  <child>bar2</child>
</abc>
<xyz>5</xyz>

I have created a regex as follows:

var regexapi = /<abc\s*name\s*=\s*"(.*?)"[\s\S]*?<\/abc>\n*<xyz>/gim;
while ( (resApi = regexapi.exec(data))) {
    array1.push(resApi[0]);
}
console.log(array1[0]);

Now if I don't have the tag <xyz>1</xyz> printing array1[0] should show undefined but it is printing as follows:

    <abc name = "foo">
  <child>bar</child>
</abc>

<abc name = "foo2">
  <child>bar2</child>
</abc>
<xyz>

I think there is some problem in \n* since I'm giving multiline flag. Not sure aout this though. Note that this is without <xyz>1</xyz> tag. I want it to print undefined. Thanks.

What are you actually trying to do here? Also, regex isn't necessarily the best tool for parsing HTML. Actually, JavaScript is an HTML parser, so you might do better using it for this question. — Tim Biegeleisen
– Tim Biegeleisen, Commented Apr 27, 2018 at 2:28
I'm taking an xml file as an input and I want to store the value in <xyz> which may or may not be present after the <abc> tag. If not present I want to store the value as undefined — Rogmier
– Rogmier, Commented Apr 27, 2018 at 2:32
As @TimBiegeleisen said, using a XML parser such as: github.com/Leonidas-from-XIV/node-xml2js would be easier than regex. — Sanketh Katta
– Sanketh Katta, Commented Apr 27, 2018 at 2:34
You can also use Cheerio (github.com/cheeriojs/cheerio) and query you data in a \jQuery-like way. — Diego ZoracKy
– Diego ZoracKy, Commented Apr 27, 2018 at 2:46
Don't parse XML with regex; use a real XML parser. See duplicate link (and many other posts here and across the web) for explanations. — kjhughes
– kjhughes, Commented Apr 27, 2018 at 12:09

Matt.G · Accepted Answer · 2018-04-27 03:10:00Z

0

Regex:

<\/abc>\n(?:<xyz>(.*)(?=<\/xyz))*

Regex Demo

js Demo

Matches a </abc> followed by <xyz> and value. if <xyz> tag is missing array[0] will return an empty string (not undefined)

answered Apr 27, 2018 at 3:10

Matt.G

3,6092 gold badges12 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael Kay Over a year ago

Like all attempts to process XML using regular expressions, it is of course wrong. For example, it doesn't allow for whitespace to appear in places where XML allows whitespace.

Tim Biegeleisen · Accepted Answer · 2018-04-27 03:21:56Z

0

You would be better off using an XML parser here. If you insist on using regex, here is one option:

var input = "<abc name = \"foo\">\n\t<child>bar</child>\n</abc>\n<xyz>\n\n<abc name = \"foo2\">\t\n<child>bar2</child>\n</abc>\n<xyz>35</xyz>";
var regex = /<abc[^>]*>(?:(?!<\/abc>)[\s\S]*)<\/abc>\s*<xyz>((?!<xyz>)[\s\S]*)<\/xyz>/g;
var match = regex.exec(input);
console.log(match[1]); // 35

This matches an <abc> tag followed by optional whitespace, then followed immediately by an <xyz> tag. Should that tag be empty, then nothing would be capture in the first capture group match[1].

edited Apr 27, 2018 at 3:21

answered Apr 27, 2018 at 2:42

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

5 Comments

Rogmier Over a year ago

Tried this. But then if the tag is empty it is capturing the value in the next <xyz> tag

Tim Biegeleisen Over a year ago

@starkVT Check my updated answer. To get it to work, I needed to add another negative lookahead to make sure it doesn't match across <xyz> tags from different HTML blocks. Hopefully you can see why regex is starting to not look so attractive right now.

kjhughes Over a year ago

You were very generous with your time to try, but the best answer is really that regex is intrinsically the wrong tool for the job rather than reinforce OP's (and future readers') misconception by providing a partial, brittle solution.

Tim Biegeleisen Over a year ago

@kjhughes So am I deleting this? I could counter your comment by saying that sometimes someone may not have access to an XML parser.

kjhughes Over a year ago

I've added node.js and browser XML parsing solutions to the duplicate link list. You've taken two steps into the quagmire of XML parsing via regex. It's your call, but if it were me, I'd stop here. Sometimes it's better to say "stay out of the swamp" than to try to address what to do when an endless progression of monsters appear.

Collectives™ on Stack Overflow

js Regex not working as expected. Newline not getting detected [duplicate]

2 Answers 2

1 Comment

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Linked

Related