1

I have a string as follows:

<abc name = "foo">
  <child>bar</child>
</abc>
<xyz>1</xyz>

<abc name = "foo2">
  <child>bar2</child>
</abc>
<xyz>5</xyz>

I have created a regex as follows:

var regexapi = /<abc\s*name\s*=\s*"(.*?)"[\s\S]*?<\/abc>\n*<xyz>/gim;
while ( (resApi = regexapi.exec(data))) {
    array1.push(resApi[0]);
}
console.log(array1[0]);

Now if I don't have the tag <xyz>1</xyz> printing array1[0] should show undefined but it is printing as follows:

    <abc name = "foo">
  <child>bar</child>
</abc>

<abc name = "foo2">
  <child>bar2</child>
</abc>
<xyz>

I think there is some problem in \n* since I'm giving multiline flag. Not sure aout this though. Note that this is without <xyz>1</xyz> tag. I want it to print undefined. Thanks.

5
  • What are you actually trying to do here? Also, regex isn't necessarily the best tool for parsing HTML. Actually, JavaScript is an HTML parser, so you might do better using it for this question. Commented Apr 27, 2018 at 2:28
  • I'm taking an xml file as an input and I want to store the value in <xyz> which may or may not be present after the <abc> tag. If not present I want to store the value as undefined Commented Apr 27, 2018 at 2:32
  • As @TimBiegeleisen said, using a XML parser such as: github.com/Leonidas-from-XIV/node-xml2js would be easier than regex. Commented Apr 27, 2018 at 2:34
  • You can also use Cheerio (github.com/cheeriojs/cheerio) and query you data in a \jQuery-like way. Commented Apr 27, 2018 at 2:46
  • Don't parse XML with regex; use a real XML parser. See duplicate link (and many other posts here and across the web) for explanations. Commented Apr 27, 2018 at 12:09

2 Answers 2

0

Regex:

<\/abc>\n(?:<xyz>(.*)(?=<\/xyz))*

Regex Demo

js Demo

Matches a </abc> followed by <xyz> and value. if <xyz> tag is missing array[0] will return an empty string (not undefined)

Sign up to request clarification or add additional context in comments.

1 Comment

Like all attempts to process XML using regular expressions, it is of course wrong. For example, it doesn't allow for whitespace to appear in places where XML allows whitespace.
0

You would be better off using an XML parser here. If you insist on using regex, here is one option:

var input = "<abc name = \"foo\">\n\t<child>bar</child>\n</abc>\n<xyz>\n\n<abc name = \"foo2\">\t\n<child>bar2</child>\n</abc>\n<xyz>35</xyz>";
var regex = /<abc[^>]*>(?:(?!<\/abc>)[\s\S]*)<\/abc>\s*<xyz>((?!<xyz>)[\s\S]*)<\/xyz>/g;
var match = regex.exec(input);
console.log(match[1]); // 35

This matches an <abc> tag followed by optional whitespace, then followed immediately by an <xyz> tag. Should that tag be empty, then nothing would be capture in the first capture group match[1].

5 Comments

Tried this. But then if the tag is empty it is capturing the value in the next <xyz> tag
@starkVT Check my updated answer. To get it to work, I needed to add another negative lookahead to make sure it doesn't match across <xyz> tags from different HTML blocks. Hopefully you can see why regex is starting to not look so attractive right now.
You were very generous with your time to try, but the best answer is really that regex is intrinsically the wrong tool for the job rather than reinforce OP's (and future readers') misconception by providing a partial, brittle solution.
@kjhughes So am I deleting this? I could counter your comment by saying that sometimes someone may not have access to an XML parser.
I've added node.js and browser XML parsing solutions to the duplicate link list. You've taken two steps into the quagmire of XML parsing via regex. It's your call, but if it were me, I'd stop here. Sometimes it's better to say "stay out of the swamp" than to try to address what to do when an endless progression of monsters appear.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.