I want to tokenize formatting strings (very roughly like printf) and I think I am only missing a small bit:
- %[number][one letter ctYymd] shall become a token²
- $1...$10 shall become a token
- all else (normal text) becomes a token.
I got quite far in the regExp simulator. This looks like it should do:
²update: now using # instead of %. (Less troubles with windows command line parameters)
It's not scary, if you focus on the three parts, connected by pipes (as either-or), so basically it's just three matches. Since I want to match from start to end, I wrapped things in /^...%/ and surrounded by a non-matching group (?:... that may repeat 1 or more times:
$exp = '/^(?:(%\\d*[ctYymd]+)|([^$%]+)|(\\$\\d))+$/';
Still my source doesn't deliver:
$exp = '/^(?:(%\\d*[ctYymd]+)|([^$%]+)|(\\$\\d))+$/';
echo "expression: $exp \n";
$tests = [
'###%04d_Ball0n%02d$1',
'%03d_Ball0n%02x$1%03d_Ball0n%02d$1',
'%3d_Ball0n%02d',
];
foreach ( $tests as $test )
{
echo "teststring: $test\n";
if( preg_match( $exp, $test, $tokens) )
{
array_shift($tokens);
foreach ( $tokens as $token )
echo "\t\t'$token'\n";
}
else
echo "not valid.";
} // foreach
I get results but: Matches are out of order. The first %[number][letter] never matches, therefore others match double:
expression: /^((%\d*[ctYymd]+)|([^$%]+)|(\$\d))+$/
teststring: ###%04d_Ball0n%02d$1
'$1'
'%02d'
'_Ball0n'
'$1'
teststring: %03d_Ball0n%02x$1%03d_Ball0n%02d$1
not valid.teststring: %3d_Ball0n%02d
'%02d'
'%02d'
'_Ball0n'
teststring: %d_foobardoo
'_foobardoo'
'%d'
'_foobardoo'
teststring: Ball0n%02dHamburg%d
'%d'
'%d'
'Hamburg'
