[I know this isn't regexp, but since you asked for 'suggestion to better solve such issue' I give you my 2 cents]
How about simply parsing the code ;):
$source = file_get_contents('/tmp/test.php'); // Change this
$tokens = token_get_all($source);
$history = [];
foreach ($tokens as $token) {
if (is_string($token)) { // simple 1-character token
array_push($history, str_replace(["\r", "\n"], '', $token));
$history = array_slice($history, -2);
echo $token;
} else {
list($id, $text) = $token;
switch ($id) {
case T_STRING:
if ($history == [T_VARIABLE, '[']) {
// Token sequence is [T_VARIABLE, '[', T_STRING]
echo "'$text'";
}
else {
echo $text;
}
break;
default:
// anything else -> output "as is"
echo $text;
break;
}
array_push($history, $id);
$history = array_slice($history, -2);
}
}
Of course, the $source needs to be changed to whatever suits you. token_get_all() then loads the PHP code and parses it into a list of tokens. That list is then processed item by item and possibly changed before being output again, according to our needs.
1-char tokens like [ ("[" and "]" in f.ex $myVariable[1] both get to be tokens) are a special case which has to be handled in the loop. Otherwise $token is an array with an ID for the type of token and the token itself.
"Unfortunately" T_STRING is kind of a general case, so to pinpoint only the strings being used as constants in array indexing we store the 2 items preceding the current in $history. ("$myVariable" and "[")
..and..that's it, really. The code is read from a file, processed and output to stdout. Everything but the "constants as array index" case should be output as is.
If you like I can rewrite it as a function or something. The above should be kind of the general solution, though.
Edit Version 2, support for $myObject->myProp[key]:
<?php
$source = file_get_contents('/tmp/test.php'); // Change this
$tokens = token_get_all($source);
//print_r($tokens); exit();
$history = [];
foreach ($tokens as $token) {
if (is_string($token)) { // simple 1-character token
array_push($history, str_replace(["\r", "\n"], '', $token));
echo $token;
} else {
list($id, $text) = $token;
switch ($id) {
case T_STRING:
if (array_slice($history, -2) == [T_VARIABLE, '[']) {
// Token sequence is [T_VARIABLE, '[', T_STRING]
echo "'$text'";
}
else if (array_slice($history, -4) == [T_VARIABLE, T_OBJECT_OPERATOR, T_STRING, '[']) {
echo "'$text'";
}
else {
echo $text;
}
break;
default:
// anything else -> output "as is"
echo $text;
break;
}
array_push($history, $id);
}
// This has to be at least as large as the largest chunk
// checked anywhere above
$history = array_slice($history, -5);
}
As can be seen, the tough part about introducing more cases is that $history won't be as uniform anymore. At first I thought about fetching things directly from $tokens, but they aren't sanitized so I stuck to $history. It's possible that the "pruning line" at the bottom isn't needed, it's just there for memory usage. Maybe it's cleaner to skip $history, sanitize all $tokens items before the foreach() and then fetch things directly from it(adding the index to the foreach(), of course). I think I feel version 3 coming up ;-j..
Edit Version 3:
This should be as simple as it gets. Simply look for brackets with unquoted strings inside.
$source = file_get_contents('/tmp/test.php'); // Change this
$tokens = token_get_all($source);
$history = [];
foreach ($tokens as $token) {
if (is_string($token)) { // simple 1-character token
array_push($history, str_replace(["\r", "\n"], '', $token));
echo $token;
} else {
list($id, $text) = $token;
switch ($id) {
case T_STRING:
if (array_slice($history, -1) == ['[']) {
echo "'$text'";
}
else {
echo $text;
}
break;
default:
// anything else -> output "as is"
echo $text;
break;
}
array_push($history, $id);
}
}
Test input(/tmp/test.php):
<?php
$variable[key] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST[username]);
if ($result[ECD] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {
$object->method[key];
$variable[test] = 'one';
$variable[one][two] = 'three';
$variable->property[three]['four'] = 5;
'/(["\'])(?:(?=(\\\\?))\\2.)*?\\1(*SKIP)(*F)|(\[(?:[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)])/').$lot[wafer]become quoted as well?