1

I have a string like below.

$str = "ENGINE=InnoDB 
        DEFAULT CHARSET=utf8 
        COLLATE=utf8_unicode_ci 
        COMMENT='Table comment'";

And I need to parse the key/value pairs from the string and combine them with the key/value pairs in the array below...

$arr = array(
    "ENGINE" => "InnoDB",
    "DEFAULT CHARSET" => "utf8",
    "COLLATE" => "utf8_unicode_ci",
    "COMMENT" => "'Table comment'"
);

Here the sequence of the parts of the string can be different.

Example:

$str = "ENGINE=InnoDB
        COMMENT='Table comment'
        COLLATE=utf8_unicode_ci
        DEFAULT CHARSET=utf8";
4
  • 1
    What are you actually asking? You can use explode() to turn a string into an array. Commented Oct 22, 2015 at 2:17
  • 3
    How do you determine to break between InnoDB and DEFAULT, but not between DEFAULT and CHARSET? Commented Oct 22, 2015 at 2:21
  • 1
    You could try preg_replace with this regex: regex101.com/r/hQ5tD5/1 Commented Oct 22, 2015 at 2:33
  • if you are trying to create a config file then I found using an array is the best method to go. As you can have a config file with a return array and by using include/require to store the array in a variable like $config and access like $config["ENGINE"], $config["CHARSET"] Commented Oct 22, 2015 at 4:06

3 Answers 3

6

You should use preg_match_all() and have PHP build your output from there in the format you'd like. Here's a working example in PHP. And the regex statement.

<?php
    $str = "ENGINE=InnoDB COMMENT='Table comment' COLLATE=utf8_unicode_ci DEFAULT CHARSET=utf8";
    preg_match_all("/([\w ]+)=(\w+|'(?:[^'\\\]|\\.)+')\s*/",$str,$matches,PREG_SET_ORDER);
    $out = [];
    foreach($matches as $match) {
        $out[$match[1]] = $match[2];
    }
    var_dump($out);
?>

And the result:

array(4) {
  ["ENGINE"]=>
  string(6) "InnoDB"
  ["COMMENT"]=>
  string(15) "'Table comment'"
  ["COLLATE"]=>
  string(15) "utf8_unicode_ci"
  ["DEFAULT CHARSET"]=>
  string(4) "utf8"
}

Explanation of regex

([\w ]+) // match one or more word characters (alpha+underscore+space)
= // match equals sign
  (
      \w+ // match any word character
   | // or
      ' // match one exact quote character
      (?:[^'\\]|\\.)+ // match any character including escaped quotes
      ' // match one exact quote character
   )
\s* // match any amount of whitespace until next match
Sign up to request clarification or add additional context in comments.

1 Comment

@zedfoxus added an explanation for you
1

String looks like ini-file. With parse_ini_string:

$str = "ENGINE=InnoDB DEFAULT 
        CHARSET=utf8 
        COLLATE=utf8_unicode_ci 
        COMMENT='Table comment'";

$data = parse_ini_string($str);
var_dump($data);

array(4) {
   ["ENGINE"]=>
   string(14) "InnoDB DEFAULT"
   ["CHARSET"]=>
   string(4) "utf8"
   ["COLLATE"]=>
   string(15) "utf8_unicode_ci"
   ["COMMENT"]=>
   string(13) "Table comment"
}

3 Comments

It doesn't work with space seperated keys like DEFAULT CHARSET
@David Boskovic did you try it, this case space seperated keys like DEFAULT CHARSET?) Of course, it works.
Ah, never mind, that's an issue with your example, not the code itself. I don't think the asker has the option to add new lines since the string appears to be directly from SQL. But, I like the approach in theory! I totally forgot that method exists in PHP.
0

Here's a verbose, inelegant way of parsing the data (with plenty of comment-explanation). This could be adapted for a differently structured string.

<?php

$str = "ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci COMMENT='Table comment'";
$keys = array('ENGINE', 'DEFAULT CHARSET', 'COLLATE', 'COMMENT');

$str_array = explode('=', $str); 
/* result of the above will be
[0] => ENGINE
[1] => InnoDB DEFAULT CHARSET
[2] => utf8 COLLATE
[3] => utf8_unicode_ci COMMENT
[4] => 'Table comment' 
*/

$output = array();
$lastkey = '';

// loop through each split item
foreach ($str_array as $item) {

    // if the item is entirely one of the keys, just remember it as the last key
    if (in_array($item, $keys)) {
        $lastkey = $item;
        continue;
    }

    // check if item like InnoDB DEFAULT CHARSET contains one of the keys
    // if item contains a key, the key will be returned
    // Otherwise, item will be returned
    $result = item_has_a_key($item, $keys);

    if ($result === $item) {
        // if the result is exactly the item, that means no key was found in the item
        // that means, it is the value of the previously found key
        $output[$lastkey] = $item;
    } else {    
        // if the result is not exactly the item, that means it contained one of the keys
        // strip out the key leaving only the value. Assign the value to previously found key
        $output[$lastkey] = trim(str_replace($result, '', $item));

        // remember the key that was found
        $lastkey = $result;
    }
}

print_r($output);
/*
Result:
[ENGINE] => InnoDB
[DEFAULT CHARSET] => utf8
[COLLATE] => utf8_unicode_ci
[COMMENT] => 'Table comment'
*/


// $item can be InnoDB DEFAULT CHARSET
// $keys is the array of keys you have assigned (ENGINE, DEFAULT CHARSET etc.)
// if the item contains one of the keys, the key will be returned
// if the item contains no key, the item will be returned
function item_has_a_key($item, $keys) {
    foreach ($keys as $key) {
        if (strpos($item, $key) !== false) {
            return $key;
        }
    }
    return $item;
}
?>

2 Comments

blinks This looks like you're trying to go in the direction of a hackish lexer. Lexer writing is pretty legit stuff. Check out this article if you're up for some late night fun: nitschinger.at/Writing-a-simple-lexer-in-PHP
Thank you for sharing, @DavidBoskovic!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.