1

I want to parse a multiline text, so I wrote something like this:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";
String regex = "\\[(.*)\\] (.*) - (.*)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

What I want to get is this:

G1: timestamp1
G2: INFO
G3: message1

G1: timestamp2
G2: ERROR
G3: message2

G1: timestamp3
G2: INFO
G3: message3
    message_details1....
    message_details2...

But what I get is like this:

G1: timestamp1] INFO - Message1
    [timestamp2] ERROR - Message2
    [timestamp3
G2: INFO
G3: Message3
    Message3_details1........
    Message3_details2........

I'm not able to solve that even with Google's help.

2 Answers 2

4

You have used greedy quantifier in your regex. So, .* in [(.*)] will consume everything till the last found ]. You need to use reluctant quantifier. Add a ? after .*.

Also, for the last .*, you need to use a look-ahead, to make it stop before the next [.

The following code would work:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";

String regex = "\\[(.*?)\\] (.*?) - (.*?)(?=\\[|$)";

Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

The last part of the regex - (.*?)(?=\\[|$) matches everything till the [ in the next line, or till the end ($). $ is required for the last two lines to be captured in group 3 of the last match.

Output:

G1: timestamp1
G2: INFO
G3: Message1 


G1: timestamp2
G2: ERROR
G3: Message2 


G1: timestamp3
G2: INFO
G3: Message3 
Message3_details1......... 
Message3_details2 ......... 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot. But what if Message3_datails contains a text in square brackets? It will stop on it.
@yataodev Yeah it will stop there. In that case, you would have to modify the look-ahead a little bit.
0

try "\\[(.*?)\\] (.*?) - (.*?) \\r\\n"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.