0

I have this regex pattern: (1 Abc ([A-Z]+))|(Abc ([A-Z]+))|(1 ([A-Z]+)) that works as follows:

1 Abc TEST
Abc TEST
1 TEST
TEST

Link of the demo
It matches TEST in the first three cases, and does not match TEST in the last case.

The pattern looks a bit long, I want to make it shorter but keeping the same matching.
My tries ended with: (1 |Abc )([A-Z]+), but this pattern does not match TEST in the first string (link of the demo).

Any suggestions how to simplify the first pattern and keep the same matching?

EDIT:
To avoid all confusions, all what I want to capture is TEST when it is preceded by '1 ', 'Abc ' or '1 Abc '.

12
  • 1
    You might be looking for ^(?:\w+ +)+([A-Z]+)$ but it really depends on your actual input strings. As it stands, your question is not easily answerable. Commented Jan 30, 2020 at 13:46
  • 1
    You use multiple capturing groups. Each pair of parenthesis gets associated with a number. This is also true for the ([A-Z]+) Part which is repeated multiple times. Depending on the prefix this will change the number of the capturing group. Thus you can not really change something and keep the exact same behaviour. Commented Jan 30, 2020 at 13:56
  • 2
    Do you really want to match 1 and/or Abc or any digit and/or any string? What about 2 DEF TEST? Commented Jan 30, 2020 at 13:56
  • 1
    This is pretty short: (?<= )\S+$. It matches TEST in the first three cases, and does not match TEST in the last case. Commented Jan 30, 2020 at 13:56
  • 1
    Is this what you want? Commented Jan 30, 2020 at 14:04

2 Answers 2

2
import re

list = [
    '1 Abc TEST',
    'Abc TEST',
    '1 TEST',
    'TEST',
    '2 TEST',
    'XYZ TEST',
    'Abc 1 TEST',
]

for s in list:
    if re.match(r'^(?:1 Abc|1|Abc) ([A-Z]+)$', s):
        print('OK ' + s)

^Output:

OK 1 Abc TEST
OK Abc TEST
OK 1 TEST

Demo & explanation

Sign up to request clarification or add additional context in comments.

Comments

1

You were very close. Your first group happens twice in the first test case. that's why it was not matching properly. You simply need to add a + ( or a {1,2} if you absolutly want one or two time this group.) after the OR group.

^(1 |Abc ){1,2}([A-Z]+)$

You can test it here

I've also added beginning and end of string character for more accurate results.

Depending on your criteria, you could also change the (1 |Abc ) for something a little bit more generic.

^((\d{1}|\w{3}) ){1,2}([A-Z]+)$

See here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.