Issue parsing html file with php

Question

I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing: “

When I execute the following code, $len is assigned a value of 3.

$test = "“";
$len = strlen($test);

I'm suspecting that this character might be unicode.

For now I'm getting around this problem by replacing the curly double quote with a standard double quote. However I'm concerned about other files that might contain similar characters and I don't want to have replace functions for each separate instance.

How do I get php to treat this as a single character?

Pekka · Accepted Answer · 2011-05-06 15:53:53Z

1

PHP's standard string handling functions are not multi-byte aware, they stupidly count the number of bytes in the string.

If you have the multibyte extension installed, mb_strlen() is what you are looking for.

For example, if your data is UTF-8:

$test = "“";
$len = mb_strlen($test, "UTF-8");

edited May 6, 2011 at 15:53

answered May 6, 2011 at 15:48

Pekka

451k150 gold badges990 silver badges1.1k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Marcin · Accepted Answer · 2011-05-06 15:48:28Z

1

For unicode use php function was starts with mb_ (multibyte): For example: http://php.net/manual/en/function.mb-strlen.php

answered May 6, 2011 at 15:48

Marcin

1,6139 silver badges17 bronze badges

Comments

Kaivosukeltaja · Accepted Answer · 2011-05-06 15:48:56Z

1

Use mb_strlen(), it will handle multibyte characters.

answered May 6, 2011 at 15:48

Kaivosukeltaja

15.8k5 gold badges45 silver badges71 bronze badges

Comments

morgar · Accepted Answer · 2011-05-06 15:49:45Z

1

You need to use the multibyte version of the functions > http://php.net/manual/en/function.mb-strlen.php

answered May 6, 2011 at 15:49

morgar

2,41718 silver badges17 bronze badges

Collectives™ on Stack Overflow

Issue parsing html file with php

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related