19

I have a pattern where a user specifies:

1998-2010:Make:model:trim:engine

trim and engine are optional, if present I should capture them; if not, the matcher should at least validate YMM.

([0-9]+-*[0-9]+):(.*):(.*):(.*):(.*)

This matches if all three are there, but how do I make the last two and only two fields optional?

7
  • 1
    ([0-9]+-*[0-9]+)(:.*)?(:.*)?(:.*)?(:.*)? try adding ? to indicate that it is optional Commented Jan 21, 2014 at 20:07
  • I somehow doubt that your regex will work as it should, especially because of .* part. Maybe try posting some examples of input and expected result so we could help you correct/rewrite your regex. Commented Jan 21, 2014 at 20:10
  • @k-mera What do you mean by non-regex here? If you are thinking about split then surprisingly it also uses regex as parameter :) Commented Jan 21, 2014 at 20:13
  • but split(":") is much easier to understand than ([0-9]+-*[0-9]+):(.*):(.*):(.*):(.*) Commented Jan 21, 2014 at 20:15
  • @k-mera Agree and like your approach. But fact is that you are still using regex here, so it is not non-regex but regex-less (or something) :) Commented Jan 21, 2014 at 20:17

2 Answers 2

16

Using a regular expression and ?, the “zero or one quantifier”

You can use ? to match zero or one of something, which is what you want to do with the last bit. However, your pattern needs a bit a modification to be more like [^:]* rather than .*. Some sample code and its output follow. The regular expression I ended up with was:

([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?
|-----| |-----| |-----|    |-----|      |-----|
   a       a       a          a            a

                       |-----------||-----------|
                             b            b

Each a matches a sequence of non colons (although you'd want to modify the first one to match years), and b is a non-capturing group (so it starts with ?:) and matches zero or one time (because it has the final ? quantifier). This means that the fourth and fifth fields are optional. The sample code shows that this pattern matches in the case that there are three, four, or five fields present, and does not match if there are more than five fields or fewer than three.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        final Pattern p = Pattern.compile( "([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?" );
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            final Matcher m = p.matcher( string );
            if ( m.matches() ) {
                System.out.println( "\n=== Matches for: "+string+" ===" );
                final int count = m.groupCount();
                for ( int j = 0; j <= count; j++ ) {
                    System.out.println( j + ": "+ m.group( j ));
                }
            }
            else {
                System.out.println( "\n=== No matches for: "+string+" ===" );
            }
        }
    }
}
=== No matches for: a ===

=== No matches for: a:b ===

=== Matches for: a:b:c ===
0: a:b:c
1: a
2: b
3: c
4: null
5: null

=== Matches for: a:b:c:d ===
0: a:b:c:d
1: a
2: b
3: c
4: d
5: null

=== Matches for: a:b:c:d:e ===
0: a:b:c:d:e
1: a
2: b
3: c
4: d
5: e

=== No matches for: a:b:c:d:e:f ===

=== No matches for: a:b:c:d:e:f:g ===

=== No matches for: a:b:c:d:e:f:g:h ===

While it's certainly possible to match this kind of string by using a regular expression, it does seem like it might be easier to just split the string on : and check how many values you get back. That doesn't necessarily do other kinds of checking (e.g., characters in each field), so maybe splitting isn't quite so useful in whatever non-minimal situation is motivating this.

Using String.split and a limit parameter

I noticed your comment on another post that recommended using String.split(String) (emphasis added):

Yes I know this function, but it work for me cause I have a string which is a:b:c:d:e:f:g:h.. but I just want to group the data as a:b:c:d:e if any as one and the rest of the string as another group

It's worth noting that there's a version of split that takes one more parameter, String.split(String,int). The second parameter is a limit, described as:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

This means that you could use split and the limit 6 to get up to five fields from your input, and you'd have the remaining input as the last string. You'd still have to check whether you had at least 3 elements, to make sure that there was enough input, but all in all, this seems like it might be a bit simpler.

import java.util.Arrays;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            System.out.println( "\n== Splits for "+string+" ===" );
            System.out.println( Arrays.toString( string.split( ":", 6 )));
        }
    }
}
== Splits for a ===
[a]

== Splits for a:b ===
[a, b]

== Splits for a:b:c ===
[a, b, c]

== Splits for a:b:c:d ===
[a, b, c, d]

== Splits for a:b:c:d:e ===
[a, b, c, d, e]

== Splits for a:b:c:d:e:f ===
[a, b, c, d, e, f]

== Splits for a:b:c:d:e:f:g ===
[a, b, c, d, e, f:g]

== Splits for a:b:c:d:e:f:g:h ===
[a, b, c, d, e, f:g:h]
Sign up to request clarification or add additional context in comments.

3 Comments

@user2192900 For one thing, the final group you've got (:?:[^:]*)?(:?:[^:]*)?, but it should begin with (?:: as in my code, not (:?: as in yours.
@user2192900 The pattern I provided doesn't match a:b:c:d:e:f:g. If you're trying to use it, I have to assume there a mistake in the copy and paste (perhaps the one I already mentioned in the previous comment). I didn't realize, however, that you wanted to match 4 fields as well as 3 and 5.
@user2192900 I noticed you put a comment on another answer about wanting to match up to five fields, and getting the rest in another string. You can actually do that a bit more simply by using String.split with a limit argument. I've updated my answer to show how you can do that.
0

Why not skip the regex and use split(":"). Seems to be straight forward. From the length of the resulting array you will then know whether or not model and engine etc was provided.

String str = "1998-2010:Make:model:trim:engine";
String[] parts  = str.split(":");
//parts[0] == Y
//parts[1] == M
//parts[2] == M
//etc

Edit: As others have mentioned, String.split uses a regex pattern too. In my oppinion that doesn't really matter though. To have a truly regex-less solution use StrwingUtils.split from apache commons (which does not use a regex at all) :)

1 Comment

Yes I know this function, but it work for me cause I have a string which is a:b:c:d:e:f:g:h.. but I just want to group the data as a:b:c:d:e if any as one and the rest of the string as another group

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.