Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the associated lexer/parser (if any).
-
Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution?wen– wen2011-02-21 16:44:45 +00:00Commented Feb 21, 2011 at 16:44
-
If all you want it to do is pretty print, this will sort of work. You'll discover that regenerating floating point numbers and literal strings is more sweat than you expect. But the real question is, where did you get the token string you want to print? Presumably, you are reading some existing program, and making changes to it. In that case you'll find you need lots more machinery to parse, determine symbol tables, do flow analysis, or whatever.Ira Baxter– Ira Baxter2011-03-11 23:06:27 +00:00Commented Mar 11, 2011 at 23:06
-
Yes, I realised that rather quickly. Still, it gives me a lexer, which does, well, something...wen– wen2011-03-12 13:04:16 +00:00Commented Mar 12, 2011 at 13:04
4 Answers
In the category of "other solutions", you could try PHP Parser.
The parser turns PHP source code into an abstract syntax tree....Additionally, you can convert a syntax tree back to PHP code.
Comments
From my comment:
Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution?
After some looking, I found a PHP homemade solution in this question, that actually uses the PHP Tokenizer interface, as well as some PHP code formatting tools which are more configurable (but would require the solution as described above).
These could be used to quickly realize a solution. I'll post back here when I find some time to cook this up.
Solution with PHP_Beautifier
This is the quick solution I cooked up, I'll leave it here as part of the question. Note that it requires you to break open the PHP_Beautifier class, by changing everything (probably not everything, but this is easier) that is private to protected, to allow you to actually use the internal workings of PHP_Beautifier (otherwise it was impossible to reuse the functionality of PHP_Beautifier without reimplementing half their code).
An example usage of the class would be:
file: main.php
<?php
// read some PHP code (the file itself will do)
$phpCode = file_get_contents(__FILE__);
// create a new instance of PHP2PHP
$php2php = new PHP2PHP();
// tokenize the code (forwards to token_get_all)
$phpCode = $php2php->php2token($phpCode);
// print the tokens, in some way
echo join(' ', array_map(function($token) {
return (is_array($token))
? ($token[0] === T_WHITESPACE)
? ($token[1] === "\n")
? "\n"
: ''
: token_name($token[0])
: $token;
}, $phpCode));
// transform the tokens back into legible PHP code
$phpCode = $php2php->token2php($phpCode);
?>
As PHP2PHP extends PHP_Beautifier, it allows for the same fine-tuning under the same API that PHP_Beautifier uses. The class itself is:
file: PHP2PHP.php
class PHP2PHP extends PHP_Beautifier {
function php2token($phpCode) {
return token_get_all($phpCode);
}
function token2php(array $phpToken) {
// prepare properties
$this->resetProperties();
$this->aTokens = $phpToken;
$iTotal = count($this->aTokens);
$iPrevAssoc = false;
// send a signal to the filter, announcing the init of the processing of a file
foreach($this->aFilters as $oFilter)
$oFilter->preProcess();
for ($this->iCount = 0;
$this->iCount < $iTotal;
$this->iCount++) {
$aCurrentToken = $this->aTokens[$this->iCount];
if (is_string($aCurrentToken))
$aCurrentToken = array(
0 => $aCurrentToken,
1 => $aCurrentToken
);
// ArrayNested->off();
$sTextLog = PHP_Beautifier_Common::wsToString($aCurrentToken[1]);
// ArrayNested->on();
$sTokenName = (is_numeric($aCurrentToken[0])) ? token_name($aCurrentToken[0]) : '';
$this->oLog->log("Token:" . $sTokenName . "[" . $sTextLog . "]", PEAR_LOG_DEBUG);
$this->controlToken($aCurrentToken);
$iFirstOut = count($this->aOut); //5
$bError = false;
$this->aCurrentToken = $aCurrentToken;
if ($this->bBeautify) {
foreach($this->aFilters as $oFilter) {
$bError = true;
if ($oFilter->handleToken($this->aCurrentToken) !== FALSE) {
$this->oLog->log('Filter:' . $oFilter->getName() , PEAR_LOG_DEBUG);
$bError = false;
break;
}
}
} else {
$this->add($aCurrentToken[1]);
}
$this->controlTokenPost($aCurrentToken);
$iLastOut = count($this->aOut);
// set the assoc
if (($iLastOut-$iFirstOut) > 0) {
$this->aAssocs[$this->iCount] = array(
'offset' => $iFirstOut
);
if ($iPrevAssoc !== FALSE)
$this->aAssocs[$iPrevAssoc]['length'] = $iFirstOut-$this->aAssocs[$iPrevAssoc]['offset'];
$iPrevAssoc = $this->iCount;
}
if ($bError)
throw new Exception("Can'process token: " . var_dump($aCurrentToken));
} // ~for
// generate the last assoc
if (count($this->aOut) == 0)
throw new Exception("Nothing on output!");
$this->aAssocs[$iPrevAssoc]['length'] = (count($this->aOut) -1) - $this->aAssocs[$iPrevAssoc]['offset'];
// post-processing
foreach($this->aFilters as $oFilter)
$oFilter->postProcess();
return $this->get();
}
}
?>
1 Comment
If I'm not mistaken http://pear.php.net/package/PHP_Beautifier uses token_get_all() and then rewrites the stream. It uses heaps of methods like t_else and t_close_brace to output each token. Maybe you can hijack this for simplicity.
1 Comment
See our PHP Front End. It is a full PHP parser, automatically building ASTs, and a matching prettyprinter that regenerates compilable PHP code complete with the original commments. (EDIT 12/2011: See this SO answer for more details on what it takes to prettyprint from ASTs, which are just an organized version of the tokens: https://stackoverflow.com/a/5834775/120163)
The front end is built on top of our DMS Software Reengineering Toolkit, enabling the analysis and transformation of PHP ASTs (and then via the prettyprinter code).