2

I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing: “

When I execute the following code, $len is assigned a value of 3.

$test = "“";
$len = strlen($test);

I'm suspecting that this character might be unicode.

For now I'm getting around this problem by replacing the curly double quote with a standard double quote. However I'm concerned about other files that might contain similar characters and I don't want to have replace functions for each separate instance.

How do I get php to treat this as a single character?

4 Answers 4

1

PHP's standard string handling functions are not multi-byte aware, they stupidly count the number of bytes in the string.

If you have the multibyte extension installed, mb_strlen() is what you are looking for.

For example, if your data is UTF-8:

$test = "“";
$len = mb_strlen($test, "UTF-8");
Sign up to request clarification or add additional context in comments.

Comments

1

For unicode use php function was starts with mb_ (multibyte): For example: http://php.net/manual/en/function.mb-strlen.php

Comments

1

Use mb_strlen(), it will handle multibyte characters.

Comments

1

You need to use the multibyte version of the functions > http://php.net/manual/en/function.mb-strlen.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.