I'm trying to implement a simple REGEX that allows me to capture some info within a XML.
However, my REGEX capture several tags and gives me a very long answer. For example, If I have something like:
<item>
<title>bla</title>
...
<description>bla</description>
</item>
<item>
<title>bla2</title>
....
<description>bla2, keyword here are blablabla</description>
</item>
However, I use a REGEX like:
<item><title>([\\p{L}\\p{N}\\W \\.\\,]*?)</title>.*?<description>[\\p{L}\\p{N} \\.\\,]keyword[\\p{L}\\p{N} \\.\\,]*</description>
There are tags between title and description. When I use that REGEX it gives me all the tags until the first time it finds the word "keyword". So, the problem is this line:
</title>.*?<description>
How can I tell my REGEX that if the first description tag it finds doesn't have the keyword, it should select the next tag and return the result from the second item tag. Or, that it should not look for all the data between the title tag and the description tag if there is an ending item tag between those two.
I hope I'm explaining myself clearly. Please, ask for clarification if needed.
Edit:
An alternative solution:
<item><title>([\\p{L}\\p{N}\\W \\.\\,]*?)</title>(?:(?!<item>).)*?<description>[\\p{L}\\p{N} \\.\\,]keyword[\\p{L}\\p{N} \\.\\,]*</description>
Using (?:(?!).)* as a negative lookahead to avoid the capture of strings within new items.