5

I'm trying to capture key-value pairs from strings that have the following form:

a0=d235 a1=2314 com1="abcd" com2="a b c d"

Using help from this post, I was able to write the following regex that captures the key-value pairs:

Pattern.compile("(\\w*)=(\"[^\"]*\"|[^\\s]*)");

The problem is that the second group in this pattern also captures the quotation marks, as follows:

a0=d235
a1=2314
com1="abcd"
com2="a b c d"

How do I exclude the quotation marks? I want something like this:

a0=d235
a1=2314
com1=abcd
com2=a b c d

EDIT:

It is possible to achieve the above by capturing the value in different groups depending on whether there are quotation marks or not. I'm writing this code for a parser so for performance reasons I'm trying to come up with a regex that can return the value in the same group number.

2 Answers 2

10

How about this? The idea is to split the last group into 2 groups.

Pattern p = Pattern.compile("(\\w+)=\"([^\"]+)\"|([^\\s]+)");

String test = "a0=d235 a1=2314 com1=\"abcd\" com2=\"a b c d\"";
Matcher m = p.matcher(test);

while(m.find()){
    System.out.print(m.group(1));
    System.out.print("=");
    System.out.print(m.group(2) == null ? m.group(3):m.group(2));
    System.out.println();
}

Update

Here is a new solution in response to the updated question. This regex applies positive look-ahead and look-behind to make sure there is a quote without actually parsing it. This way, groups 2 and 3 above, can be put in the same group (group 2 below). There is no way to exclude the quotes by while returning group 0.

Pattern p = Pattern.compile("(\\w+)=\"*((?<=\")[^\"]+(?=\")|([^\\s]+))\"*");

String test = "a0=d235 a1=2314 com1=\"abcd\" com2=\"a b c d\"";
Matcher m = p.matcher(test);

while(m.find()){
    print m.group(1);
    print "="
    println m.group(2);
}

Output

a0=d235
a1=2314
com1=abcd
com2=a b c d
Sign up to request clarification or add additional context in comments.

5 Comments

This is similar to @burning_LEGION's answer. I've just made an edit to my question; is it possible to capture them in the same group?
No, not all in one expression. You would have to get rid of the quotation marks in every one of the right-side groups. See here: stackoverflow.com/questions/277547/…
@Dawood It is possible to capture quoted and unquoted strings in a single group while excluding the quotes but there is no way to capture everything (group 0) while excluding quotes.
@user845279: this works... thanks! The lookahead and lookbehind constructs are pretty useful but I haven't quite gotten the hang of them yet.
Wow, I didn't think it was possible, but your new update works really well! However, you do want to add a non-capturing clause, because right now you're keeping 3 groups. Here's an update on yours: Pattern.compile("(\\w+)=\"*((?<=\")[^\"]+(?=\")|(?:[^\\s]+))\"*");
0

use this regex (\w+)=(("(.+?)")|(.+?)(?=\s|$)) key and value contain in regex groups

3 Comments

I tried something similar but since I'm writing this code for a parser, I'm trying to avoid checking groups separately since it will affect performance. Your code will store the value in different groups depending on whether there were quotation marks or not. Is there a way to store it in the same group?
Could you explain what's the meaning of ( .+?)?
@LiSeeLeiCow-Q__Q it catches all symbols before " same as ([^"]+)"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.