1

Folks, here is another on "regex: match everything, but not ...", but so far non seems to fit my simple question.

I need to program my Excel function to separate strings from their preceding enumerators (similar as done here: VBA regex: extract multiple strings between strings within Excel cell with custom function)

My first simple string is: "1 Rome; 2 London; 3 Wembley Stadium"

My second string looks like: "1.1 Winner; 2.1 Looser; 3.3 Penalties (always loose, dam)"

And I need to extract only the names but not the ranks ( eg. "Rome; London; Wembley Stadium" and "Winner; Looser; Penalties (always loose, dam)").

Using a regex tester (https://extendsclass.com/regex-tester.html), I can simply match the opposite by:

([0-9]+\s*) and it gives me:

"1 Rome, 2 London, 3 Wembley Stadium".

But how to reverse it? I tried something like:

[^0-9 |;]+[^0-9 |;], but it also excludes white spaces that I want to maintain (e.g. after the comma and in between Wembley and Stadium, ... "1 Rome, 2 London, 3 Wembley Stadium"). I guess the "0-9 " needs be determined somehow as one continuous string. I tried various brackets, quotation marks, \s*, but nothing jet.

Note: I'm working in a visual basic environment and not allowing lookbehinds!
Note: My solutions needs to be compatible across Excel versions as far as possible!

5
  • If you want to end up with a list of individual names then splitting on ; & looping removing leading spaces/digits would be a simple way. If you want the names in a single string together then just match the digit part (\d*(\.?\d+)\s+) and RegEx.Replace it with "". Commented Jul 23, 2021 at 14:45
  • You should simply add (?:\.\d+)* to match zero or more occurrences of a . and one or more digits, \d+(?:\.\d+)*\s*(.*?)(?=;\s*\d+(?:\.\d+)*\s|$) Commented Jul 23, 2021 at 16:22
  • Did this solution solve the problem? Commented Jul 24, 2021 at 10:15
  • @Wiktor: Somehow it does not, even though it reads logical to me. It includes also the numerical prefix in my VBA function. No idea why. Commented Jul 26, 2021 at 13:49
  • Again, use match.Submatches(0) only. Of course the number will land in the whole match. Commented Jul 26, 2021 at 14:48

1 Answer 1

1

I tried negating the numerical values and the period as "one continuous string" ([^\d|\.]). This will keep two spaces at some places. Test regex 1

The explaination group by group regexr Explaination

To remove these double spaces try with ([^\d|\.])(?<!; ) Here I'm just adding a negative look behind which might not be supported by all regex interpreters.

test regex 2

Negative lookbehind explanation and warning

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.