22

This code

print mb_substr('éxxx', 0, 1);

prints an empty space :(

It is supposed to print the first character, é. This seems to work however:

print mb_substr('éxxx', 0, 2);

But it's not right, because (0, 2) means 2 characters...

0

2 Answers 2

49

Try passing the encoding parameter to mb_substr, as such:

print mb_substr('éxxx', 0, 1, 'utf-8');

The encoding is never detected automatically.

Sign up to request clarification or add additional context in comments.

10 Comments

The encoding is never detected automatically, it just always defaults to something.
Could it be a better idea if you use mb_detect_encoding to actually try to detect the encoding?
@AlvinWong No. Know what encoding you're working with, there's no other way.
@Alvin Wong, that would be more correct, yes, but I could also say that using anything but utf-8 can be considered adventurous and marginal :)
OK, then how about mb_internal_encoding instead of passing "utf-8" to all mb_* functions? Just like Álvaro G. Vicario has pointed out
|
13

In practice I've found that, in some systems, multi-byte functions default to ISO-8859-1 for internal encoding. That effectively ruins their ability to handle multi-byte text.

Setting a good default will probably fix this and some other issues:

mb_internal_encoding('UTF-8');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.