Implementing a Negative Lookahead in Regex to exclude a block of code if it contains a certain string

Question

This is a follow up to an original question I posted here, but I would appreciate help in expanding its capabilities a bit. I have the following string I am trying to capture from (let's call it output):

ltm pool TEST_POOL { 
    Some strings
    above headers
    records { 
        baz:1 {
            ANY STRING
            HERE
            session-status enabled
        } 
        foobar:23 { 
            ALSO ANY
            STRING HERE
            session-status enabled
        }
    }
    members {
        qux:45 {
            ALSO ANY
            STRINGS HERE
            session-status enabled
        }
        bash:2 {
            AND ANY
            STRING HERE
            session-status user-disabled
        }
        topaz:789 {
            AND ANY
            STRING HERE
            session-status enabled
        }        
    }
    Some strings
    below headers
}

Consider each line of output to be separated by a typical line break. For the sake of this question, let's refer to records and members as "titles" and baz, foobar, qux, bash, and topaz as "headers". I am trying to formulate a regex in Java that will capture all headers between the brackets of a given title EXCEPT those that contain the string session-status user-disabled between their own header brackets as can be seen above. For example, given we want to find all headers of title members with this code:

String regex = "(?:\\bmembers\\s*\\{|(?<!^)\\G[^{]+\\{[^}]+\\})\\s*?\\n\\s*([^:{}]+)(?=:\\d)";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(output);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

The output should be just ...

qux
topaz

Thus, it should exclude the bash header because it has session-status user-disabled in between its brackets. I'm having trouble implementing a negative lookahead in the regex I'm using to accomplish this. In addition, baz and foobar should also not match because they are contained within the brackets of a different "title" all together. There can be any number of titles and any number of headers. Some help in modifying my regex to include a negative lookahead to solve this problem would be much appreciated.

Are you sure you still want to use a regex rather than writing a real parser for this? — Robert
– Robert, Commented Dec 17, 2015 at 16:16
@Robert Eh, the thing is it's pretty unlikely I'll need to break this data apart in any other way in the near future. Any particular tools you would recommend for parsing this if I decide to go that direction in the future? — user2150250
– user2150250, Commented Dec 17, 2015 at 20:53

Josh Crozier · Accepted Answer · 2015-12-17 15:59:10Z

1

I built off of your previous expression and added an alternation that will attempt to match any "header" using a non-capturing group if it contains the string session-status user-disabled. In doing so, those "headers" will be negated because they aren't captured. Only titles of "headers" that contain the string session-status enabled will be matched.

Example Here

(?:\bmembers\s*\{|(?<!^)\G)\s*?\n\s*(?:(?:[^{]*\{[^}]*?session-status user-disabled[^}]*\})|([^:{}]+)(?=:\d)[^{]*\{[^}]*\})

edited Dec 17, 2015 at 15:59

answered Dec 17, 2015 at 15:49

Josh Crozier

242k56 gold badges401 silver badges316 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Implementing a Negative Lookahead in Regex to exclude a block of code if it contains a certain string

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related