0

I have 2 strings of form:

Beta_Gambus_teta_some_changeable_string_2017.02.1276 and 
Beta_Gambus_teta__some_changeable_string_update_2017.02.1276

Example:

 Beta_Gambus_teta_wqtr_2017.02.1276.ctr
 Beta_Gambus_teta_wqtr_update_2017.02.1277.ctr
 Beta_Gambus_teta_tpsedr_2017.02.1276.ctr
 Beta_Gambus_teta_tpesdr_update_2017.02.1277.ctr
 Beta_Gambus_teta_cnmsr_2018.02.1279.ctr 
 Beta_Gambus_teta_cnms_update_2018.02.1279.ctr

I need to catch with regex the ones with 'update' in them separated from the ones without 'update' in them.

I'm using ^.+_(.+)\.ctr$ but it is to broad.

3
  • Do all of the strings start with Beta_Gambus_teta ? Commented Jul 30, 2019 at 9:57
  • 4
    why not do "update" in s? Commented Jul 30, 2019 at 9:58
  • @TimBiegeleisen yes Commented Jul 30, 2019 at 10:01

4 Answers 4

2

Unless you're not telling us something, regex is not at all needed here...

strings = ["Beta_Gambus_teta_wqtr_2017.02.1276.ctr",
           "Beta_Gambus_teta_wqtr_update_2017.02.1277.ctr",
           "Beta_Gambus_teta_tpsedr_2017.02.1276.ctr",
           "Beta_Gambus_teta_tpesdr_update_2017.02.1277.ctr",
           "Beta_Gambus_teta_cnmsr_2018.02.1279.ctr",
           "Beta_Gambus_teta_cnms_update_2018.02.1279.ctr"]

with_update = []
no_update = []
for s in strings:
    if "update" in s:
        with_update.append(s)
    else:
        no_update.append(s)

Even getting rid of the if:

res = ([], [])

for s in strings:
    res["update" in s].append(s)

no_update, with_update = res

And both gives:

>>> print(with_update)
['Beta_Gambus_teta_wqtr_update_2017.02.1277.ctr', 'Beta_Gambus_teta_tpesdr_update_2017.02.1277.ctr', 'Beta_Gambus_teta_cnms_update_2018.02.1279.ctr']
>>> print(no_update)
['Beta_Gambus_teta_wqtr_2017.02.1276.ctr', 'Beta_Gambus_teta_tpsedr_2017.02.1276.ctr', 'Beta_Gambus_teta_cnmsr_2018.02.1279.ctr']
Sign up to request clarification or add additional context in comments.

Comments

1

You may try using the following pattern for the update match:

Beta_Gambus_teta_[^_]+_update_\d{4}\.\d{2}\.\d{4}\.ctr

and use this pattern for the non update match:

Beta_Gambus_teta_[^_]+_\d{4}\.\d{2}\.\d{4}\.ctr

Sample script:

path = "Beta_Gambus_teta_wqtr_update_2017.02.1277.ctr"
if re.search(r'Beta_Gambus_teta_[^_]+_update_\d{4}\.\d{2}\.\d{4}\.ctr', path):
    print("MATCH")

Comments

1

To match strings with _update_ use:

^Beta_Gambus_teta_.*_update_\d{4}\.\d{2}\.\d{4}\.ctr$

and to match strings without _update_:

^Beta_Gambus_teta_(?!.*_update_).*_\d{4}\.\d{2}\.\d{4}\.ctr$

Here (?!.*_update_) is a negative lookahead assertion that fails the match if _update_ is found after starting Beta_Gambus_teta_ part.

2 Comments

The relationship between the two types of paths appears to simply be that update_ is removed from the positive case.
Since OP used term some_changeable_string I assume it can be anything that may or may not contain _
0

Have you tried the following?

.+update.+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.