2

I am struggling with a regex in python. I've spent several hours trying to figure out what is wrong. Here is my content:

Some Title - Description (Gold Edition)
Some Title - Description

I need to match Some Title and optional Gold word in brackets.

I've tried the following regex https://regex101.com/r/9MNYZl/1 :

(.*)\-.*(?:\((.*)[Ee]dition\))*?

But it doesn't capture the word before Edition.

One interesting thing that I tried this for PHP and it worked fine.

I have no ideas what is wrong, please help to solve the issue.

Many thanks.

0

1 Answer 1

2

The first .* in your pattern will match until the end of the string, then it will backtrack to match the - and the second .* will match again till the end of the string.

As this part of the pattern (?:\((.*)[Ee]dition\))*? is optional, the pattern will suffice at the end of the string.

You could use a negated character class with an optional non capturing group.

To match the first word after the opening parenthesis you could match 1+ word chars \w+ or a broader match using \S+

^([^-]+)-[^\()]+(?:\((\S+) [Ee]dition\))?

In parts

  • ^ Start of string
  • ( Capture group 1
    • [^-]+ Match 1+ times any char except -
  • )- Close group 1 and match -
  • [^()]+ Match 1+ times any char except ( or )
  • (?: Non capturing group
    • \( Match (
    • (\S+) Capture group 2, match 1+ times a non whitespace char
    • [Ee]dition Match a space and [eE]dition
    • \) Match )
  • )? Close non capturing group and make it optional

Regex demo

To capture all until the edition in group 2 instead of a single word:

^([^-]+)-[^()]+(?:\(([^()]+) [Ee]dition\))?

Regex demo

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.