3

Given the following string:

FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTA

I'm trying to match every substring that contains CABDA with the following regex:

C.*?A.*?B.*?D.*?A

The only thing I find then is

CFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDA

Which in itself is not wrong - but I should be finding CSAYBRHXQQGUDA

What am I missing?

You can test it here if you'd like

Any help is appreciated.

4
  • 1
    This is how regexes are supposed to work. They look for the first match, not the shortest. Laziness does not alter that behavior, it simply will aim to match the shortest of all strings that start there. Commented Dec 20, 2015 at 15:44
  • @CommuSoft I tried with groups as well. Shouldn't I be getting all matches then? Commented Dec 20, 2015 at 15:46
  • @Nilzone- you will if you use lookahead. Commented Dec 20, 2015 at 15:48
  • @Nilzone-, matching would "consume" the character at that position. That is why you don't get all substrings if they intersect with one another. Each position can be used in only one match. Commented Dec 20, 2015 at 15:55

3 Answers 3

2

A lazy quantifier doesn't mean that it would try to match the smallest substring possible. It just means that it would try to match as little characters as it can and backtrack towards more characters, as opposed to match as many characters as it can and backtrack towards less.

Finding the position remains the same - the first one from left to right. For example:

x+?y

when matched against:

xxxy

will still match xxxy and not just xy since it was able to start from the first x and backtrack towards more xes.

Sign up to request clarification or add additional context in comments.

Comments

1

You can use this negation class based regex:

/C[^C]*?A[^A]*?B[^B]*?D[^D]*?A/

RegEx Demo

This finds CSAYBRHXQQGUDA in your given input.

Comments

0
(?=(C.*?A.*?B.*?D.*?A))

Put your expression inside lookahead to get all matches.See demo

https://regex101.com/r/fM9lY3/46

If you want to find only the shortest you can use

C(?:(?!C|A|B|D).)*A(?:(?!C|A|B|D).)*B(?:(?!C|A|B|D).)*D(?:(?!C|A|B|D).)*A

3 Comments

Your regex is terrible. (?:(?!C|A|B|D).)* can be rewritten as [^CABD]*.
@nhahtdh I guessed OP must be having some complex problem...like some strings instead of simple ABCD
I forgot to mention this earlier - the second regex is also wrong for input such as CAADBA. It only happens to work with the input in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.