8

I've got an application which determines, given a perl regex, if it should display a dropdown menu or a simple input field. Therefore, I have to check the regex pattern for an "outer form" and substrings. For this, I came up with several solutions.

Given the input pattern "^(100|500|1000)$", which should result in a drop down menu with three entries, 100, 500 and 1000. I need one regex which parses the entire pattern, to determine if it is a valid list, and one regex that does the actual substring match, since I don't know how to match one substring multiple times. This is my regex pattern:

^\^\((?:((?:[^\|]|\\\|)+)(?:\||(?:\)\$$)))+

A little bit of simplification, since this regex is a little bit fuzzy:

^\^\((?:([\w\d]+)(?:\||(?:\)\$$)))+

This works, but only stores the last substring (1000 in the given case) and throws the rest away, tested with either PCRE and online regex tools. To get the actual substrings, i.e. dropdown menu fields, I have:

(?:\^\()?((?:[^\|]|\\|)+)(?:\||(?:\)\$$))

Simplification again:

(?:\^\()?([\w\d]+)(?:\||(?:\)\$$))

This matches the substring but doesn't match the dropdown menu pattern syntax which the other regex does (this one also matches "^(100|" with substring "100", for example). My question is: is there a way to combine these regular expressions to have just one pattern that matches 1) the entire pattern syntax and 2) the actual substrings?

Thanks in advance,

Jeremy

P.S.: sorry if this is obvious, but I'm very bit tangled about all these regular expressions today.

Sample data:

Input regex: ^(100|500|1000)$
Syntax OK!
Matched substrings: 100, 500, 1000
=> show dropdown menu

Input regex: ^[0-9a-fA-F]+$
Syntax is wrong!
=> show regular input field

Input regex: ^(foo|bar)$
Syntax OK!
Matched substrings: "foo", "bar"
=> show dropdown menu

Input regex: ^(foo|bar)[0-9]+$
Syntax is wrong!
=> show regular input field

3
  • I'm not sure of your question. Can you put sample data for what you have and what you need as output? Commented Aug 18, 2014 at 19:58
  • Thanks for your help! I added some sample data. Commented Aug 18, 2014 at 20:11
  • I've updated my answer with the data you provided Commented Aug 18, 2014 at 20:20

2 Answers 2

7

You can achieve what you need by using two steps.

You could use this regex to validate the format:

\^\(\w+(?:\|\w+)*\)\$

Working demo

enter image description here

Once you validated the right strings you can use a function like this:

$str = "^(100|500|1000|2000|3000)$";
$arr = preg_split ("/\W+/" , $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($arr);

Output:

Array
(
    [0] => 100
    [1] => 500
    [2] => 1000
    [3] => 2000
    [4] => 3000
)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! This simplifies my syntax matching regex, but doesn't capture substrings ("100", "500", "1000" in this case). Can you help me with this, too?
I modified you regex to "\^((\w+)(?:\|(\w+))*)\$". This captures two groups, both are "\w+" and the last one will be overwritten with more than 2 items (which happens for the first test string).
@Jeremy You can use my regex \^((\w+)(?:\|(\w+))*)\$ in addition to Lucas's answer. Meanwhile I'll for this
@Jeremy I think you should use two steps. First use my regex to validate the pattern, then once you validated you can use preg_split ("/\W+/", string $subject)
Thanks again for your help. Since my application is written in C, I can't apply your code snippets. But I will simplify my regular expressions. As Alan stated, PCRE is not capable of intermediate captures.
1

Looks like you're using PCRE.

You can leverage the PCRE_DUPNAMES option, or alternatively put the (?J) option in the front of the pattern.

This option makes PCRE remember every capturing group's value that matches, and not just throw away everything but the last one. (this is wrong, see comments)

Unfortunately, it's not supported by the online testing tools AFAIK. I don't know which language you use, but it needs some support too to let you use this feature.

From the PCRE docs:

If you want to get full details of all captured substrings for a given name, you must use the pcre_get_stringtable_entries() function.

5 Comments

My application is written in C and I indeed use pcre. Thanks for your hint, I will try this.
I admit I'm a bit confused over this after taking a look at the PCRE docs. I assumed it's similar to .NET's multiple captures but now I'm not so sure anymore. Please let me know.
PCRE_DUPNAMES allows you to assign the same name to two or more groups, even if those groups have different numbers. When the match is finished, there will still be only one value associated with that name, same as with group numbers. PCRE still doesn't provide a way retrieve intermediate captures of repeated groups like .NET does.
@Lucas: I tried your hints but they didn't work - PCRE still doesn't return all matches, just as Alan stated. Therefore, I will simplify my regular expressions as Fede suggested.
@AlanMoore I clearly misunderstood the docs about this. Too bad, thanks for the clarification. Jeremy: sorry for the misleading answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.