99

What's the best/most efficient way to extract text set between parenthesis? Say I wanted to get the string "text" from the string "ignore everything except this (text)" in the most efficient manner possible.

So far, the best I've come up with is this:

$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);

$shortString = substr($fullString, $start, $end);

Is there a better way to do this? I know in general using regex tends to be less efficient, but unless I can reduce the number of function calls, perhaps this would be the best approach? Thoughts?

1

10 Answers 10

176

i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)

$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];
Sign up to request clarification or add additional context in comments.

7 Comments

No, it isn't: . only matches a single character.
not necessarily, ? is a lazy match. without it, a string like 'ignore (everything) except this (text)', the match would end up being 'everthing) except this (text'
Good to know. Should avoid all those squared nots. E.g. /src="([^"]*)"/ now replaced with /src="(.*?)"/ :D
It's good that you can "understand when you look back on it". Failing that, you've got some Stack Overflow comments to clarify it.
the /src="([^"]*)"/ is more efficient than /src="(.*?)"/
|
18

So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:

$str = "ignore everything except this (text)";
$start  = strpos($str, '(');
$end    = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);

Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.

Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.

As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.

1 Comment

Note that if you modify this code to use strrpos (starts from the back of the string) on the $end then it will correctly handle cases where there are parens within.. like (well this is (very) nice).
14

Use a regular expression:

if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
    $text = $match[1];

Comments

9

this is just another way, short and easy to read.

$string = 'ignore everything except this (text)';
$string = explode(')', (explode('(', $string)[1]))[0];
echo $string;

Comments

8

The already posted regex solutions - \((.*?)\) and \(([^\)]+)\) - do not return the innermost strings between an open and close brackets. If a string is Text (abc(xyz 123) they both return a (abc(xyz 123) as a whole match, and not (xyz 123).

The pattern that matches substrings (use with preg_match to fetch the first and preg_match_all to fetch all occurrences) in parentheses without other open and close parentheses in between is, if the match should include parentheses:

\([^()]*\)

Or, you want to get values without parentheses:

\(([^()]*)\)        // get Group 1 values after a successful call to preg_match_all, see code below
\(\K[^()]*(?=\))    // this and the one below get the values without parentheses as whole matches 
(?<=\()[^()]*(?=\)) // less efficient, not recommended

Replace * with + if there must be at least 1 char between ( and ).

Details:

  • \( - an opening round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class)
  • [^()]* - zero or more characters other than ( and ) (note these ( and ) do not have to be escaped inside a character class as inside it, ( and ) cannot be used to specify a grouping and are treated as literal parentheses)
  • \) - a closing round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class).

The \(\K part in an alternative regex matches ( and omits from the match value (with the \K match reset operator). (?<=\() is a positive lookbehind that requires a ( to appear immediately to the left of the current location, but the ( is not added to the match value since lookbehind (lookaround) patterns are not consuming. (?=\() is a positive lookahead that requires a ) char to appear immediately to the right of the current location.

PHP code:

$fullString = 'ignore everything except this (text) and (that (text here))';
if (preg_match_all('~\(([^()]*)\)~', $fullString, $matches)) {
    print_r($matches[0]); // Get whole match values
    print_r($matches[1]); // Get Group 1 values
}

Output:

Array ( [0] => (text)  [1] => (text here) )
Array ( [0] => text    [1] => text here   )

Comments

3

This is a sample code to extract all the text between '[' and ']' and store it 2 separate arrays(ie text inside parentheses in one array and text outside parentheses in another array)

   function extract_text($string)
   {
    $text_outside=array();
    $text_inside=array();
    $t="";
    for($i=0;$i<strlen($string);$i++)
    {
        if($string[$i]=='[')
        {
            $text_outside[]=$t;
            $t="";
            $t1="";
            $i++;
            while($string[$i]!=']')
            {
                $t1.=$string[$i];
                $i++;
            }
            $text_inside[] = $t1;

        }
        else {
            if($string[$i]!=']')
            $t.=$string[$i];
            else {
                continue;
            }

        }
    }
    if($t!="")
    $text_outside[]=$t;

    var_dump($text_outside);
    echo "\n\n";
    var_dump($text_inside);
  }

Output: extract_text("hello how are you?"); will produce:

array(1) {
  [0]=>
  string(18) "hello how are you?"
}

array(0) {
}

extract_text("hello [http://www.google.com/test.mp3] how are you?"); will produce

array(2) {
  [0]=>
  string(6) "hello "
  [1]=>
  string(13) " how are you?"
}


array(1) {
  [0]=>
  string(30) "http://www.google.com/test.mp3"
}

1 Comment

+1 but how do the same for [* and *] ? Because [] only maybe used on html for example.
3

This function may be useful.

    public static function getStringBetween($str,$from,$to, $withFromAndTo = false)
    {
       $sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
       if ($withFromAndTo)
         return $from . substr($sub,0, strrpos($sub,$to)) . $to;
       else
         return substr($sub,0, strrpos($sub,$to));
    }
    $inputString = "ignore everything except this (text)";
    $outputString = getStringBetween($inputString, '(', ')'));
    echo $outputString; 
    //output will be test

    $outputString = getStringBetween($inputString, '(', ')', true));
    echo $outputString; 
    //output will be (test)

strpos() => which is used to find the position of first occurance in a string.

strrpos() => which is used to find the position of first occurance in a string.

Comments

0
function getStringsBetween($str, $start='[', $end=']', $with_from_to=true){
$arr = [];
$last_pos = 0;
$last_pos = strpos($str, $start, $last_pos);
while ($last_pos !== false) {
    $t = strpos($str, $end, $last_pos);
    $arr[] = ($with_from_to ? $start : '').substr($str, $last_pos + 1, $t - $last_pos - 1).($with_from_to ? $end : '');
    $last_pos = strpos($str, $start, $last_pos+1);
}
return $arr; }

this is a little improvement to the previous answer that will return all patterns in array form:

getStringsBetween('[T]his[] is [test] string [pattern]') will return:

Comments

0
function getAllStrings($inString, $inStart, $inEnd, $inBetween = FALSE)
{
   $list      = array();
   $last_pos  = strpos($inString, $inStart, 0);

   $len_start = strlen($inStart);
   $len_end   = strlen($inEnd);

   while ($last_pos !== FALSE)
   {
      $end_pos  = strpos($inString, $inEnd, $last_pos + $len_start);
      $list[]   = ($inBetween ? $inStart : '').substr($inString, $last_pos + $len_start, $end_pos - ($last_pos + $len_start)).($inBetween ? $inEnd : '');
      $last_pos     = strpos($inString, $inStart, $end_pos + $len_end);
   }

   return $list;
}

This function:

  • accepts same delimiters - ex. a single quote ('test').
  • accepts long delimiters - ex. a combination (!@#test!@#)

Comments

0
function getAllStrings($inString, $inStart, $inEnd, $inBetween = FALSE)
{
   $list      = array();
   $last_pos  = strpos($inString, $inStart, 0);

   $len_start = strlen($inStart);
   $len_end   = strlen($inEnd);

   while ($last_pos !== FALSE)
   {
      $end_pos  = strpos($inString, $inEnd, $last_pos + $len_start);
      $list[]   = ($inBetween ? $inStart : '').substr($inString, $last_pos + $len_start, $end_pos - ($last_pos + $len_start)).($inBetween ? $inEnd : '');
      $last_pos     = strpos($inString, $inStart, $end_pos + $len_end);
   }

   return $list;
}

This function:

  • accepts same delimiters - ex. a single quote ('test').
  • accepts long delimiters - ex. a combination (!@#test!@#)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.