0

I am writing a regular expression in PHP that will need to extract data from strings that look like:

Naujasis Salemas, Šiaurės Dakota
Jungtinės Valstijos (Centras, Šiaurės Dakota)

I would like to extract:

Naujasis Salemas
Centras

For the first case, I have written [^-]*(?=,), which works quite well. I would like to modify the expression so that if there are parenthesis ( and ) , it should search between those parenthesis and then extract everything before the comma.

Is it possible to do something like this with just 1 expression? If so, how can I make it search within parenthesis if they exist?

3 Answers 3

2

A conditional might help you here:

$stra = 'Naujasis Salemas, Šiaurės Dakota';
$strb = 'Jungtinės Valstijos (Centras, Šiaurės Dakota)';

$regex = '
  /^                    # Anchor at start of string.
    (?(?=.*\(.+,.*\))   # Condition to check for: presence of text in parenthesis.
        .*\(([^,]+)     # If condition matches, match inside parenthesis to first comma.
      | ([^,]+)         # Else match start of string to first comma.
    )
  /x
';
preg_match($regex, $stra, $matches) and print_r($matches);

/*
Array
(
    [0] => Naujasis Salemas
    [1] => 
    [2] => Naujasis Salemas
)
*/

preg_match($regex, $strb, $matches) and print_r($matches);

/*
Array
(
    [0] => Jungtinės Valstijos (Centras
    [1] => Centras
)
*/

Note that the index in $matches changes slightly above, but you might be able to work around that using named subpatterns.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice :) (you could use the x modifier to comment in the regex directly ;) )
Ah, thanks! I could've sworn I saw regex comments somewhere else on SO, and it was bugging me that I couldn't get it to work for mine (:
1

I think this one could do it:

[^-(]+(?=,)

This is the same regex as your, but it doesn't allow a parenthesis in the matched string. It will still match on the first subject, and on the second it will match just after the opening parenthesis.

Try it here: http://ideone.com/Crhzz

Comments

1

You could use

[^(),]+(?=,)

That would match any text except commas or parentheses, followed by a comma.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.