9

Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the associated lexer/parser (if any).

3
  • Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution? Commented Feb 21, 2011 at 16:44
  • If all you want it to do is pretty print, this will sort of work. You'll discover that regenerating floating point numbers and literal strings is more sweat than you expect. But the real question is, where did you get the token string you want to print? Presumably, you are reading some existing program, and making changes to it. In that case you'll find you need lots more machinery to parse, determine symbol tables, do flow analysis, or whatever. Commented Mar 11, 2011 at 23:06
  • Yes, I realised that rather quickly. Still, it gives me a lexer, which does, well, something... Commented Mar 12, 2011 at 13:04

4 Answers 4

3

In the category of "other solutions", you could try PHP Parser.

The parser turns PHP source code into an abstract syntax tree....Additionally, you can convert a syntax tree back to PHP code.

Sign up to request clarification or add additional context in comments.

Comments

2

From my comment:

Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution?

After some looking, I found a PHP homemade solution in this question, that actually uses the PHP Tokenizer interface, as well as some PHP code formatting tools which are more configurable (but would require the solution as described above).

These could be used to quickly realize a solution. I'll post back here when I find some time to cook this up.


Solution with PHP_Beautifier

This is the quick solution I cooked up, I'll leave it here as part of the question. Note that it requires you to break open the PHP_Beautifier class, by changing everything (probably not everything, but this is easier) that is private to protected, to allow you to actually use the internal workings of PHP_Beautifier (otherwise it was impossible to reuse the functionality of PHP_Beautifier without reimplementing half their code).

An example usage of the class would be:

file: main.php

<?php
// read some PHP code (the file itself will do)
$phpCode = file_get_contents(__FILE__);

// create a new instance of PHP2PHP
$php2php = new PHP2PHP();

// tokenize the code (forwards to token_get_all)
$phpCode = $php2php->php2token($phpCode);

// print the tokens, in some way
echo join(' ', array_map(function($token) {
  return (is_array($token))
    ? ($token[0] === T_WHITESPACE)
      ? ($token[1] === "\n")
        ? "\n"
        : ''
      : token_name($token[0])
    : $token;
}, $phpCode));

// transform the tokens back into legible PHP code
$phpCode = $php2php->token2php($phpCode);
?>

As PHP2PHP extends PHP_Beautifier, it allows for the same fine-tuning under the same API that PHP_Beautifier uses. The class itself is:

file: PHP2PHP.php

class PHP2PHP extends PHP_Beautifier {

  function php2token($phpCode) {
    return token_get_all($phpCode);
  }

  function token2php(array $phpToken) {

    // prepare properties
    $this->resetProperties();
    $this->aTokens = $phpToken;
    $iTotal        = count($this->aTokens);
    $iPrevAssoc    = false;

    // send a signal to the filter, announcing the init of the processing of a file
    foreach($this->aFilters as $oFilter)
      $oFilter->preProcess();

    for ($this->iCount = 0;
         $this->iCount < $iTotal;
         $this->iCount++) {
      $aCurrentToken = $this->aTokens[$this->iCount];
      if (is_string($aCurrentToken))
        $aCurrentToken = array(
          0 => $aCurrentToken,
          1 => $aCurrentToken
        );

      // ArrayNested->off();
      $sTextLog = PHP_Beautifier_Common::wsToString($aCurrentToken[1]);

      // ArrayNested->on();
      $sTokenName = (is_numeric($aCurrentToken[0])) ? token_name($aCurrentToken[0]) : '';
      $this->oLog->log("Token:" . $sTokenName . "[" . $sTextLog . "]", PEAR_LOG_DEBUG);
      $this->controlToken($aCurrentToken);
      $iFirstOut           = count($this->aOut); //5
      $bError              = false;
      $this->aCurrentToken = $aCurrentToken;
      if ($this->bBeautify) {
        foreach($this->aFilters as $oFilter) {
          $bError = true;
          if ($oFilter->handleToken($this->aCurrentToken) !== FALSE) {
            $this->oLog->log('Filter:' . $oFilter->getName() , PEAR_LOG_DEBUG);
            $bError = false;
            break;
          }
        }
      } else {
        $this->add($aCurrentToken[1]);
      }
      $this->controlTokenPost($aCurrentToken);
      $iLastOut = count($this->aOut);
      // set the assoc
      if (($iLastOut-$iFirstOut) > 0) {
        $this->aAssocs[$this->iCount] = array(
          'offset' => $iFirstOut
        );
        if ($iPrevAssoc !== FALSE)
          $this->aAssocs[$iPrevAssoc]['length'] = $iFirstOut-$this->aAssocs[$iPrevAssoc]['offset'];
        $iPrevAssoc = $this->iCount;
      }
      if ($bError)
        throw new Exception("Can'process token: " . var_dump($aCurrentToken));
    } // ~for

    // generate the last assoc
    if (count($this->aOut) == 0)
        throw new Exception("Nothing on output!");

    $this->aAssocs[$iPrevAssoc]['length'] = (count($this->aOut) -1) - $this->aAssocs[$iPrevAssoc]['offset'];

    // post-processing
    foreach($this->aFilters as $oFilter)
      $oFilter->postProcess();
    return $this->get();
  }
}
?>

1 Comment

@Kirzilla: I'm not entirely sure what you are trying to accomplish, but if you want to work on a PHP AST, have you tried using NikiC's PHP-Parser? It is gives you an entire AST (not just tokens) and is maintained by a PHP core developer. See this question: stackoverflow.com/questions/5586358/….
1

If I'm not mistaken http://pear.php.net/package/PHP_Beautifier uses token_get_all() and then rewrites the stream. It uses heaps of methods like t_else and t_close_brace to output each token. Maybe you can hijack this for simplicity.

1 Comment

I ended up doing this for a while, and it worked, though PHP_Beautifier is pretty hard to extend for this purpose, and forced me to break open some methods.
-2

See our PHP Front End. It is a full PHP parser, automatically building ASTs, and a matching prettyprinter that regenerates compilable PHP code complete with the original commments. (EDIT 12/2011: See this SO answer for more details on what it takes to prettyprint from ASTs, which are just an organized version of the tokens: https://stackoverflow.com/a/5834775/120163)

The front end is built on top of our DMS Software Reengineering Toolkit, enabling the analysis and transformation of PHP ASTs (and then via the prettyprinter code).

8 Comments

Is anything equivalent to the DMS toolkit available as open source? This is rather expensive for a toy project. ^^
@Pepijn: The closest things to DMS are Stratego/XT and TXL. They both have parsers, builds AST, and can regenerate code. Stratego/XT may have a PHP parser, but I don't know how robust it is, and that matters because PHP is truly badly documented language (DMS's PHP parser has run across millions of lines of PHP; its pretty solid). I don't think TXL has a complete PHP parser. ANTLR parses and can build ASTs with additional effort; it doesn't have any specific pretty printer machinery that I know about. ...
@Pepijn: ... You should observe that you might be able to get something that begins to approximate what DMS does, but you'll likely have to replicate the part that is missing (accurate PHP grammar? PrettyPrinter? Analysis support engines? ...). By the time you do that, you'll discover IMHO that the open source versions are more expensive than DMS at least if you think your time isn't zero cost. People will accuse me of beating my own drum here, and I'll agree with them. It is hard to replicate 15 years of continuous engineering let alone the 10 years of concepts on which DMS is based.
@Ira: I fully agree with you on this, and should I ever need something like this for a business project (with a clear goal in mind) I will definitely consider DMS; however, as of yet I am still a student, and my goal in mind is learning. Thus implementing this (even a half-assed version of it) will probably bring me more in experience then just buying and toying with the DMS. Still, thank you; looks like it's a great product. (You should note that although my question was PHP-related, my interest was, in fact, cross-language.)
@Ira: Also, I feel that the accepted answer better reflects the original question as asked, and is thus more in line with the purposes of Stack Overflow.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.