1

I found a function in MediaWiki Source, which converts a string with umlauts to HEX format.

Now i want to convert the formated string back to the string with umlauts.

The function:

    $Umlaut = "löschen";

    $out = preg_replace_callback("/([\\xc0-\\xff][\\x80-\\xbf]*)/",'stripForSearchCallback', $Umlaut);

    function stripForSearchCallback( $matches ) {
         return 'u8' .  bin2hex( $matches[1] );
    }

    echo $out;

Output: "lu8c3b6schen"

Now i want to convert "lu8c3b6schen" back to "löschen".

How can i do this please?

3
  • Match for hexnum tuples and use chr(hexdec()) as callback. Commented Jun 23, 2013 at 18:22
  • Is your output even valid? I though that hex format should be a pair of [a-z0-9], but you have an uneven format 8c3b6. Maybe I'm missing something ? Also notice that a regex would most likely mess your sentence for example if you have numbers in it löschen 65 or consecutive hex letters: acce nt Commented Jun 23, 2013 at 18:32
  • Yes you are right. I dont think its HEX. Mediawiki says the following: "Armor a case-folded UTF-8 string to get through MySQL's * fulltext search without being mucked up by funny charset * settings or anything else of the sort." - But i cant find any function which is able to convert it back Commented Jun 23, 2013 at 18:36

1 Answer 1

2

Try something like that:

$string = "lu8c3b6schen";

$out = preg_replace_callback("/u8([a-f0-9]{4})/",'unstrip', $string);

function unstrip( $matches ) {
    $decoded = hex2bin( $matches[1] );
    return $decoded ? $decoded : $matches[1];
}

echo $out;
Sign up to request clarification or add additional context in comments.

1 Comment

sigh How did I not see the u, +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.