1

I've been using the following Regex to extract a zip code from a bunch of text:

    "\\d{5}\\-?[1-9]?[1-9]?[1-9]?[1-9]?"

My intention of making the last 4 [1-9] optional (using ? ) was to be able to extract both 5 digit zip codes and 5 digit zip codes with + 4 such as 11001-1010.

However, it only matches the first two digits of the last four numbers even though I put 4 digits at the end.

For example, in the zip code 11001-1010 it would match 11001-10.

Anyone know why?

3
  • 1
    Why not simply make a group? "\\d{5}(?:\\-\\d{4})?". Commented Sep 10, 2015 at 2:44
  • 1
    For zip code 11001-1010 your regex would only match 11001-1 because the optional 4 digits after the - cannot be 0. Commented Sep 10, 2015 at 3:00
  • @HopefullyHelpful: the regex engine is greedy by default. It tries to match as much as it can (there are no lazy quantifiers in this case). Also, x{0,4} is exactly the same as x?x?x?x? Commented Sep 10, 2015 at 3:12

3 Answers 3

1

Simple answer to question: For zip code 11001-1010 your regex would only match 11001-1 because the optional 4 digits after the - cannot be 0.

For the unstated question of how to fix that, it depends on whether you only want to match an optional +4, or you want to also match +3, +2, +1, and +0, like your expression would.

Matching Zip5 with optional +4, e.g. matching 11001-1010 and 11001:

"\\d{5}(?:-\\d{4})?"

Matching Zip5 with optional +N, e.g. matching 11001-1010, 11001-101, 11001-10, 11001-1, 11001-, and 11001:

"\\d{5}(?:-\\d{0,4})?"

Update

Now, if you want to make sure it doesn't match the 56789-1234 of 123456789-123456789 or abcd56789-1234qwerty, you can add a word-boundary check:

"\\b\\d{5}(?:-\\d{4})?\\b"
Sign up to request clarification or add additional context in comments.

Comments

1

It's stopping at the first 0 in the suffix, "\d{5}\-?[1-9]?[1-9]?[1-9]?[1-9]?" So in your example, it only matches up to 11001-1 Does "\d{5}\-?[0-9]?[0-9]?[0-9]?[0-9]?" work ok? The other answers are probably cleaner, but that is the bug.

Looks ok per this

Comments

1

You can use \\d{5}\\-\\d{0,4} which allows you to match 0 to 4 digits after -.

EDIT

From the comment : But then the - won’t be optional.

For that you can use \\d{5}(\\-\\d{0,4})? to make group of - and digits after dash optional.

1 Comment

But then the - won’t be optional.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.