3

I am trying to match the following sample:

ZU2A ZS6D-9 ZT0ER-7 ZR6PJH-12

It is a combination of letters and numbers (alphanumeric). Here is an explanation:

  1. It will always start with a capital (uppercase) Z
  2. Followed always by only ONE(1) of R,S,T or U "[R|S|T|U]"
  3. Followed always by only ONE(1) number "[0-9]"
  4. Followed always by a minimum of ONE(1) and optionally a maximum of THREE(3) capital (uppercase) letters like this [A-Z]{1,3}
  5. Optionally followed by "-" and a minimum of ONE(1) and a maximum of TWO(2) numbers

At the moment I have this:

Z[R|S|T|U][0-9][A-Z]{1,}(\-)?([0-9]{1,3})

But that does not seem to catch all the samples.

EDIT: Here is a sample of a complete string:

ZU0D>APT314,ZT1ER,WIDE1,ZS3PJ-2,ZR5STU-12*/V:/021414z2610.07S/02814.02Ek067/019/A=005475!w%<!

Any help would be appreciated.

Thank You

Danny

1 Answer 1

5

Your main problem is that the whole optional part should be surrounded by one set of parentheses marked with ? (=optional). All in all, you want

Z[RSTU][0-9][A-Z]{1,3}(?:-[0-9]{1,2})?

A couple of extra notes:

  • In a character group, you can simply list the characters. So for 2 you want either [RSTU] or (?:R|S|T|U).
  • A group in the form of (?:example) instead of (example) prevents the sub-expression from being returned as a match. It has no effect on which inputs are matched.
  • You don't need to escape - with a backslash outside of a character class.

Here's an example test case script in Python:

import re

s = r'Z[RSTU][0-9][A-Z]{1,3}(?:-[0-9]{1,2})?'

rex = re.compile(s)
for test in ('ZU2A', 'ZS6D-9', 'ZT0ER-7', 'ZR6PJH-12'):
    assert rex.match(test), test

long_test = 'ZU0D>APT314,ZT1ER,WIDE1,ZS3PJ-2,ZR5STU-12*/V:/021414z2610.07S/02814.02Ek067/019/A=005475!w%<!'
found = rex.findall(long_test)
assert found == ['ZU0D', 'ZT1ER', 'ZS3PJ-2', 'ZR5STU-12'], found
Sign up to request clarification or add additional context in comments.

8 Comments

Last count in the RE should be {1,2}.
Why the closing ? ? How would greedy and non-greedy behavior make a difference here?
@user1016274 Thanks, replaced 3 by 2 at the end. ? only indicates greediness if it follows a * or +. Without any of those preceding, ? is a short way to write {0,1}.
Not including the optional trailing part is IMHO not a requirement, and would in fact return only partial matches (as you've noted). So rather ()? instead of (?:)?, right?
I'm not sure what you mean by return only partial matches. The resulting object of a successful match will have a group 0 that matches everything. If we were to use your suggestion ^Z[RSTU][0-9][A-Z]{1,3}(-[0-9]{1,3})?$, group 1 of the result would sometimes contain a string and sometimes not (depending on the regexp implementation, it may also not be present in some cases). This means additional overhead for the allocation and copying of the partially matched string, and the chance that somebody uses group 1 by accident. It's faster and cleaner not to match stuff nobody wanted to match.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.