1

I've been trying to extract something inside a string. I got the follwing string :

*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george

I want to split by commas outside parentheses and it is suppose to give this:

[
    "*",
    "bob",
    "DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd'))",
    "george"
]

I've been trying to use explode but it cut even inside ( and ) ... logic by the function mean.

So I've did this : [^(,\s]+|\([^)]+\) but it give cut even if a commas is found inside bracket.

Anyone know how to do what I mean?

EDIT :

Ok to be very clear and direct.

I got this : SELECT MyField, Field2, Blabla, Function(param), etc FROM table Blabla

I got the string MyField, Field2, Blabla, Function(param), etc already because the query is done by multiple function class like $DB->Select('MyField, Field2, Blabla, Function(param), etc'); but now I want to parse everything between commas so MyField, Field2, Blabla, Function(param), etc become this :

  • MyField
  • Field2
  • Blabla
  • Function(param)
  • etc
12
  • 3
    SQL is an irregular language; matching/analyzing it with a regular expression is the incorrect way to go about this. It's the wrong tool for the job. (That's not to say that it won't work, it'll probably end up biting you in the rear later on, though.) Commented May 25, 2012 at 15:20
  • @SpikeX How you suggest to split a statement then ? SELECT (this) FROM Commented May 25, 2012 at 15:21
  • Some form of string parsing logic written in PHP, since with PHP you have much greater control and a much wider selection of parsing tools available at your disposal (things like conditionals, loops, etc). Commented May 25, 2012 at 15:22
  • 1
    @DavidBélanger This will not answer your question, but it is relevant to your interest and I completly agree with the accepted answer. Commented May 25, 2012 at 15:28
  • 1
    @DavidBélanger That Regex is going to come back to haunt you, I promise. You're trying to fasten a screw with a hammer. Commented May 25, 2012 at 15:42

6 Answers 6

4

Posting this as an answer since it's probably better than anything else:

http://code.google.com/p/php-sql-parser/

Use that project to parse your SQL statements. The results come back as an array, including the bits in between SELECT and FROM as individual elements, just as you want. This will work far better than any regular expression solution you use.

Sign up to request clarification or add additional context in comments.

4 Comments

I agree this is the way to go when parsing an SQL statement. But this is not I was looking for my need.
Why not? You need to parse a SQL string and retrieve the elements of the SELECT portion of the string. Why does this not do what you want?
Because this is not what it intend to do. Why use a Civic instead of a Ferrari ? A huge class will take way more horse power then a simple function / regex. I don't need to parse complicated string, only simple as you saw.
I think you're overestimating the overhead associated with that project. If it's all class-based, only the classes you invoke take up memory, not the entire project.
2

Here's what I cooked up, doesn't support multibyte characters:

Edit: added string awareness

<?php


$stack = array();
$stuff = array();

$escaping = false;
$input = "*, bob, [], DATE('g()d\\'f,gfd', ('Fd()sf)ds'), ('fdsfd\"\"()fsd')), ',(),() (,,'";
$len = strlen( $input );
$i = 0;
$curstr = "";
$char;

while( $i < $len ) {
    $char = $input[$i++];

    if( $escaping ) {
        $curstr .= $char;
        $escaping = false;
        continue;
    }

    switch( $char ) {

        case "\\":
            $escaping = true;
            break;

        case '"':
            $top = end( $stack );
            if( $top === '"' ) {
                array_pop( $stack );
            }
            else if( $top !== "'" ){
                $stack[] = '"';
            }

            $curstr .= $char;
            break;

        case "'":
            $top = end( $stack );
            if( $top === "'" ) {
                array_pop( $stack );
            }
            else if( $top !== '"' ) {
                $stack[] = "'";
            }

            $curstr .= $char;           
            break;

        case ",":
            if( count( $stack ) ) {
                $curstr .= $char;
            }
            else {
                $stuff[] = trim($curstr);
                $curstr = "";
            }
            break;

        case "(":
            $top = end( $stack );
            if( $top !== "'" && $top !== '"' ) {
                $stack[] = "(";                   
            }

            $curstr .= $char;
            break;

        case ")":
            $top = end( $stack );

            if( $top !== "'" && $top !== '"' ) {
                if( end($stack) !== "(" ) {
                    die( "Unbalanced parentheses" );
                }
                array_pop( $stack );
            }

            $curstr .= $char;


            break;

        default:
            $curstr .= $char;
            break;

    }
}

if( count( $stack ) ) {
    die( "Unbalanced ".end($stack) );
}

$stuff[] = trim( $curstr );

print_r( $stuff );

/*
    Array
(
    [0] => *
    [1] => bob
    [2] => []
    [3] => DATE('g()d'f,gfd', ('Fd()sf)ds'), ('fdsfd""()fsd'))
    [4] => ',(),() (,,'
)

*/

17 Comments

As with my solution it won't work if there are brackets within quoted data.
@diolemo what do you mean? brackets are treated as any normal character here (not comma, ( or ) ). Didn't see op wanted them to be treated specially?
Consider this example. 'test(1', 'test2' We have to allow for brackets within quoted strings (those should not be considered). Take a look at my solution too as it is shorter (but it suffers the same problem).
But why would I put 'test(1', 'test2' inside my query ? Start with a quote and one paranthesis ?
@DavidBélanger it depends whether any part of your query is constructed from user provided data. If not then you are fine to use this or my solution.
|
0

You stated in your comments that you're prepared to use recursion because you have nested lists. However, regex cannot do recursion. This is because regex cannot "count" anything indefinitely. Since it has no way of counting open/close parenthesis, it can't know how many levels in it is, or how many levels out it must go.

You can write horrendously complex regex to handle N levels of depth (see anubhava's answer), but as soon as you run across an expression with N+1 levels of depth your regex will fail. This is why we use programming languages to parse irregular languages because they can count recursion (see diolemo's answer). Within this recursion, we can use small bits of regex.

2 Comments

@DavidBélanger Notice how little regex there is in that solution. The regex isn't doing any more than character validation - it's not doing any of the parsing. You can use regex in recursion, but not recursion in regex.
0

This will work (for the most part). It will fail if you have brackets within quotes (part of the data). You can extend the code to handle quoted brackets if you want (but then you have to consider escaped quotes and everything like that. A regular expression will never work well.

Edit: Better to use the PHP SQL Parser as answered by SpikeX.

function unreliable_comma_explode($str)
{
   $last_split = 0;
   $len = strlen($str);
   $brackets = 0;
   $parts = array();

   for ($i = 0; $i < $len; $i++)
   {
      if ($str[$i] == '(') 
      {
         $brackets++;
         continue;
      }

      if ($str[$i] == ')')
      {
         if (--$brackets == -1) $brackets = 0;
         continue;
      }

      if ($str[$i] == ',' && $brackets == 0)
      {
         $parts[] = substr($str, $last_split, ($i-$last_split));
         $last_split = $i + 1;
      }
   }

   if (($len-$last_split) > 0)
      $parts[] = substr($str, $last_split, ($len-$last_split));

   return $parts;
}

Comments

0

You can use this regex based code to get the split result the way you want:

$str = "*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george";
$arr = preg_split('/([^,]*(?:\([^)]*\))[^,]*)+|,/', $str, -1,
                      PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Update:

Though my original answer worked for the example that OP posted but due the concerns raised by some members I am posting a solution that will work with nested parenthesis as well as long as brackets are balanced:

$str = "*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]," .
       "john, MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd')), george";
$arr = preg_split('/\s*( [^,()]* \( ( [^()]* | (?R) )* \) [^,()]* ) ,?\s* | \s*,\s*/x',
                  $str, -1 , PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);

OUTPUT:

Array
(
    [0] => *
    [1] => bob
    [2] => DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]
    [3] => john
    [4] => MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd'))
    [5] => george
)

Caution: Even though this recursion based regex pattern works with deep nested brackets now it doesn't mean that this cannot be broken for some edge case situations (like unbalanced brackets).

8 Comments

Breaks with: "*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd')), george"
Just be careful - this will break if you have any loops nested further than this.
I'm very curious how this actually works... I can't replicate your results. What does (?R) do?
@DanRasmussen: Here is the code Working Demo: ideone.com/4Eqhj and here is a great tutorial on Recursive Reex in PHP
@anubhava +2 for teaching me something completely new about regex. -1 for shattering my perception of "regular" expressions...
|
-1

I'm not really sure about what you want to do here.. But if you just want to extract strings. You can just use implode.

$array = array("*", "bob", "DATE('gdfgfd', 'Fdsfds', '(\"fdsfdfsd\"))", "george");
echo $test = implode($array, ",");

1 Comment

He said implode didn't work because he wanted everything inside DATE(...) as a single entity.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.