1

First of all I understand both JS escape() and unescape() are deprecated. Basically we have an ancient system which JS escape() the data before storing in DB, every time we need to unescape() the data on client side before we can display the actual data (I know it's stupid but it was done years ago to support Unicode characters on non-unicode compliant DB).

Is there any existing PHP implementation which simulate the JavaScript escape() and unescape() function?

2 Answers 2

3

After some searching I was able to put together the two PHP functions which does what I want. The codes are not pretty but works 100% on the data we have so far so thought I would share them here.

/**
 *  Simulate javascript escape() function
 */
function escapejs($source) {
    $map = array(           
      ,'~'        => '%7E'
      ,'!'        => '%21'
      ,'\''       => '%27'       // single quote
      ,'('        => '%28'
      ,')'        => '%29'
      ,'#'        => '%23'
      ,'$'        => '%24'
      ,'&'        => '%26'
      ,','        => '%2C'
      ,':'        => '%3A'
      ,';'        => '%3B'
      ,'='        => '%3D'
      ,'?'        => '%3F'
      ,' '       => '%20'       // space
      ,'"'        => '%22'       // double quote
      ,'%'        => '%25'
      ,'<'        => '%3C'
      ,'>'        => '%3E'
      ,'['        => '%5B'
      ,'\\'       => '%5C'       // forward slash \
      ,']'        => '%5D'
      ,'^'        => '%5E'
      ,'{'        => '%7B'
      ,'|'        => '%7C'
      ,'}'        => '%7D'
      ,'`'        => '%60'
      ,chr(9)     => '%09'
      ,chr(10)    => '%0A'
      ,chr(13)    => '%0D'
      ,'¡'       => '%A1'
      ,'¢'       => '%A2'
      ,'£'       => '%A3'
      ,'¤'       => '%A4'
      ,'¥'       => '%A5'
      ,'¦'       => '%A6'
      ,'§'       => '%A7'
      ,'¨'       => '%A8'
      ,'©'       => '%A9'
      ,'ª'       => '%AA'
      ,'«'       => '%AB'
      ,'¬'       => '%AC'
      ,'¯'       => '%AD'
      ,'®'       => '%AE'
      ,'¯'       => '%AF'
      ,'°'       => '%B0'
      ,'±'       => '%B1'
      ,'²'       => '%B2'
      ,'³'       => '%B3'
      ,'´'       => '%B4'
      ,'µ'       => '%B5'
      ,'¶'       => '%B6'
      ,'·'       => '%B7'
      ,'¸'       => '%B8'
      ,'¹'       => '%B9'
      ,'º'       => '%BA'
      ,'»'       => '%BB'
      ,'¼'       => '%BC'
      ,'½'       => '%BD'
      ,'¾'       => '%BE'
      ,'¿'       => '%BF'
      ,'À'       => '%C0'
      ,'Á'       => '%C1'
      ,'Â'       => '%C2'
      ,'Ã'       => '%C3'
      ,'Ä'       => '%C4'
      ,'Å'       => '%C5'
      ,'Æ'       => '%C6'
      ,'Ç'       => '%C7'
      ,'È'       => '%C8'
      ,'É'       => '%C9'
      ,'Ê'       => '%CA'
      ,'Ë'       => '%CB'
      ,'Ì'       => '%CC'
      ,'Í'       => '%CD'
      ,'Î'       => '%CE'
      ,'Ï'       => '%CF'
      ,'Ð'       => '%D0'
      ,'Ñ'       => '%D1'
      ,'Ò'       => '%D2'
      ,'Ó'       => '%D3'
      ,'Ô'       => '%D4'
      ,'Õ'       => '%D5'
      ,'Ö'       => '%D6'
      ,'×'       => '%D7'
      ,'Ø'       => '%D8'
      ,'Ù'       => '%D9'
      ,'Ú'       => '%DA'
      ,'Û'       => '%DB'
      ,'Ü'       => '%DC'
      ,'Ý'       => '%DD'
      ,'Þ'       => '%DE'
      ,'ß'       => '%DF'
      ,'à'       => '%E0'
      ,'á'       => '%E1'
      ,'â'       => '%E2'
      ,'ã'       => '%E3'
      ,'ä'       => '%E4'
      ,'å'       => '%E5'
      ,'æ'       => '%E6'
      ,'ç'       => '%E7'
      ,'è'       => '%E8'
      ,'é'       => '%E9'
      ,'ê'       => '%EA'
      ,'ë'       => '%EB'
      ,'ì'       => '%EC'
      ,'í'       => '%ED'
      ,'î'       => '%EE'
      ,'ï'       => '%EF'
      ,'ð'       => '%F0'
      ,'ñ'       => '%F1'
      ,'ò'       => '%F2'
      ,'ó'       => '%F3'
      ,'ô'       => '%F4'
      ,'õ'       => '%F5'
      ,'ö'       => '%F6'
      ,'÷'       => '%F7'
      ,'ø'       => '%F8'
      ,'ù'       => '%F9'
      ,'ú'       => '%FA'
      ,'û'       => '%FB'
      ,'ü'       => '%FC'
      ,'ý'       => '%FD'
      ,'þ'       => '%FE'
      ,'ÿ'       => '%FF'
    );

    $convmap = array(0x80, 0x10ffff, 0, 0xffffff);

    $org = $source;

    // make sure string is UTF8
    if (false === mb_check_encoding($source, 'UTF-8')) {
        if (false === ($source = iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $source))) {
          $source = $org;
        }
    }

    $chrArray = preg_split('//u', $source, -1, PREG_SPLIT_NO_EMPTY);  // split up the UTF8 string into chars
    $oChrArray = array();

    foreach ($chrArray as $index => $chr) {

      if (isset($map[$chr])) {
        $chr = $map[$chr];
      }
      // if char doesn't fall within ASCII then assume unicode, get the hex html entities
      //elseif (mb_detect_encoding($chr, 'ASCII', true) !== 'ASCII') {
      else {
        $chr = mb_encode_numericentity($chr, $convmap, "UTF-8", true);

        // since we will be converting the &#x notation to the non-standard %u for backward compatbility, make sure the code is 4 digits long by prepending 0p
        if (substr($chr, 0, 3) == '&#x' && substr($chr, -1) == ';' && strlen($chr) == 7)
          $chr = '&#x0'.substr($chr, 3);
      }

      $oChrArray[] = $chr;
    }
    $decodedStr = implode('', $oChrArray);
    $decodedStr = preg_replace('/&#x([0-9A-F]{4});/', '%u$1', $decodedStr);   // we need to use the %uXXXX format to simulate results generated with js escape()
    return $decodedStr;
}

/**
 *  Simulate javascript unescape() function
 */
function unescapejs($source) {
    $source = str_replace(array('%0B'), array(''), $source);    // stripe out vertical tab
    $s= preg_replace('/%u(....)/', '&#x$1;', $source);
    $s= preg_replace('/%(..)/', '&#x$1;', $s);
    return html_entity_decode($s, ENT_QUOTES, 'UTF-8');
}
Sign up to request clarification or add additional context in comments.

Comments

-1

You're looking for urlencode(). If the output of that encoding isn't acceptable to you, you can try rawurlencode().

This has more info:

http://php.net/manual/en/function.urldecode.php

http://php.net/manual/en/function.urlencode.php

But if you're just wanting to do decoding to store data into a mysql database, then you can use the built-in mysql escape string function which converts input into a decent output format that can be injected into a mysql database.

See:

http://php.net/manual/en/mysqli.real-escape-string.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.