1

I am currently working in Java, and I have an issue matching multiple date formats in a JSON string using a Regex.

JSON:

{"x": "02/23/2019", "y": "02-27-2019"}

Regex:

[0-9]{1,2}(/|-)[0-9]{1,2}(/|-)[0-9]{4}

In a Regex tester, this regex matches both dates. But in the Java code, I only get one date from the group. The second group is just a "\".

Java Code:

private static void findDates() {
    String regex = "[0-9]{1,2}(/|-)[0-9]{1,2}(/|-)[0-9]{4}";
    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    String json =
            "{\"x\":\"02/23/2019\",\n" +
             "\"y\":\"02-27-2019\"}";
    Matcher matcher = pattern.matcher(json);
    if (matcher.find()) {
        for (int i = 0; i < matcher.groupCount(); i++) {
            String dateMatch = matcher.group(i);
            System.out.println(dateMatch);
        }
        System.out.println(json);
    }
}

I need to be able capture all occurrences of dates that match the format specified by the regex. So if there happen to be three dates in the JSON with MM/dd/yyyy or MM-dd-yyyy formats, when I iterate over the groups, I should get all three dates, or all five dates, or all two dates, etc..

2
  • Why you have \1 here in String regex = "[0-9]{1,2}(/|-)[0-9]{1,2}(/|-)[0-9]{4}\1+ ? Commented Feb 25, 2019 at 17:48
  • My mistake, was trying something in test and forgot to remove. Nice catch. Commented Feb 25, 2019 at 17:49

2 Answers 2

4

Your code is a bit incorrect. When you are trying to find all the matches, you need to use while(matcher.find()). Also you can write (/|-) as [/-]. Check out this Java code.

String regex = "[0-9]{1,2}([/-])[0-9]{1,2}\\1[0-9]{4}";
Pattern pattern = Pattern.compile(regex);
String json = "{\"x\":\"02/23/2019\",\n" + "\"y\":\"02-27-2019\"}";

Matcher matcher = pattern.matcher(json);
while (matcher.find()) {
    System.out.println(matcher.group());
}

Prints your both the dates,

02/23/2019
02-27-2019

Notice, I have written \\1 instead of ([/-]) before year part in the regex, so that it doesn't match dates of format, 02-23/2019 or 02/23-2019 and instead only 02-23-2019 and 02/23/2019

Also, in your code, if you use if (matcher.find()) then matcher will just do first find and will not find further matches in your string even though many of them might indeed exist. And matcher.groupCount() just gives you the number of groups in your matched regex dynamically which you are using to print all the group captures which isn't your intention in your program.

Sign up to request clarification or add additional context in comments.

1 Comment

@OleV.V.: Thanks for your encouraging words :)
0

Regex is overkill

If you have a limited number of non-ambiguous formats in play, simply attempt parsing with the LocalDate & DateTimeFormatter classes. That is what they were built for.

Define formatting patterns to match your expected inputs.

List < String > inputs = List.of( "02/23/2019" , "02-27-2019" , "07|07|2022" );
List < DateTimeFormatter > formatters =
        List.of(
                DateTimeFormatter.ofPattern( "MM/dd/uuuu" ) ,
                DateTimeFormatter.ofPattern( "MM-dd-uuuu" )
        );

Collect the results, along with bad (unexpected) inputs.

List < LocalDate > results = new ArrayList <>( inputs.size() );
List < String > faultyInputs = new ArrayList <>();

Loop the inputs. For each string, loop your defined formatters. If one formatter succeeds (matches your input’s format and successfully parses), collect the result. Else if no formatters match the input, collect the faulty input.

for ( String input : inputs )
{
    LocalDate ld = null;
    for ( DateTimeFormatter formatter : formatters )
    {
        try
        {
            ld = LocalDate.parse( input , formatter );
            results.add( ld );
            break; // Bail-out of looping the formatters. If a format matched, no need to try others.
        } catch ( DateTimeParseException e )
        {
            // Swallow exception. No code needed here.
        }
    }
    if ( Objects.isNull( ld ) ) // If we tried all the expected formats but not matched our input…
    {
        faultyInputs.add( input );
    }
}

Dump to console.

System.out.println( "results:" );
System.out.println( results );
System.out.println( "faultyInputs:" );
System.out.println( faultyInputs );

results:

[2019-02-23, 2019-02-27]

faultyInputs:

[07|07|2022]

ISO 8601

Tip: Educate whoever produces such data about the joys of ISO 8601. Exchanging date-time values textually using localized or invented formats is poor practice.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.