2

Consider that you have the following string:

id: 1 name: Joe age: 27 id: 2 name: Mary age:22

And you want to extract every token after "age:" BUT NOT the string "age:" itself.

So I want my Matcher's group() to return 27 and 22 and not "age: 27" and "age:22"

Is there a way to specify this instruction in the Java Regex syntax, which seems quite different than that in Perl, where I learned my Regex basics?

This is my code:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegExTest 
{
    public static void main(String[] args) 
    {
        Pattern namePtrn = Pattern.compile("age: *\\w*");

        String data = "id: 1 name: Joe age:27 id: 2 name: Mary age:22";

        Matcher nameMtchr = namePtrn.matcher(data);

        while(nameMtchr.find())
        {
            String find = nameMtchr.group();

            System.out.println ("\t" + find);
        }
    }
}

In Perl I can use {} to limit the portion of the pattern that I want extracted

while($text =~ m/(age:{\w+})/g)
{
      my $find = $1;

      if($find)
      {
          print "\nFIND = ".$find;
      }
}

would return

FIND = 27
FIND = 22

and if I put {} around age like

while($text =~ m/({age:\w+})/g)

it would return

FIND = age: 27
FIND = age:22

So I am looking for something like Perl's {} but in Java.

8
  • 1
    Standard capture groups (keyword) is all you get; compare with: m/age:(\w+)/g .. Commented Oct 8, 2012 at 19:01
  • 1
    (Please read the fine manual for how to access capture groups - keyword! - in Java. Just as with Perl, there is a special way to access a specific group: e.g. $1 vs. $&.) Commented Oct 8, 2012 at 19:07
  • What!? Perl uses curly braces for capture groups? Commented Oct 8, 2012 at 19:10
  • 1
    Perl uses curly braces as quantifiers; not for capture groups. Commented Oct 8, 2012 at 19:13
  • 1
    Maybe it's because you didn't properly read more than half of the answers. Commented Oct 8, 2012 at 19:20

3 Answers 3

7

If you use Matcher.group(1) instead of Matcher.group() you can capture the pattern minus 'age:':

String data = "id: 1 name: Joe age:27 id: 2 name: Mary age:22";
Pattern namePtrn = Pattern.compile("age:(\\w+)");
Matcher nameMtchr = namePtrn.matcher(data);

while (nameMtchr.find()) {
   String find = nameMtchr.group(1);
   System.out.println("\t" + find);
}
Sign up to request clarification or add additional context in comments.

1 Comment

Just as an additional note \w is for matching any word character and \ needs to be escaped so an additional \ makes it \\w and + indicates 1 or more occurrences. The curved brackets indicate the contents within to be a group.
1

Try:

age:\s*(\d+)

Matches "age:" followed by any amount of whitespace, followed by one or more digits. The digits (the numeric value) are captured in the first group.

If you want to support negative ages (i.e. -1 for "age unknown" or something) you can use:

age:\s*(-?\d+)

Which will match "age:" followed by any amount of whitespace, followed by either zero or one minus signs followed by one or more digits. The digits and the optional minus sign (the numeric value) are captured in the first group.

If you aren't sure how to get capture groups to work, consult this question which has a few examples.

4 Comments

no, that did not do. BTW, you need to use double \, otherwise Java won't compile
@foampile The example posted shows the regular expression value, not the string literal (there are no "s). Also "did not do" is a near-usless statement. A better one might be: "I still don't know how to access the capture group."
"did not do" means i put it in my code and it did not return what i was asking for in the OP
@foampile: You have to get the groups, not the entire matched pattern.
0

Use unescaped parenthesis:

Pattern namePtrn = Pattern.compile("age: *(\\w*)");

This will put it in the first capture group of the Matcher.

5 Comments

that did not do either. output "age:" before the value
@foampile ..because the capture group is still not being used - presumable the entire entire match capture is being used still, not a specific capture group. Granted neither post shows how to get to a specific capture group.
name: should be age: in the regex?
the OP stipulates clearly what the output should be: 27 and 22, not age: 27 and age:22
@foampile Again, the regex is valid and shows the use of a capture group (even with the typo), the usage of it is incorrect.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.