0

Character encoding is always a little bit tricky in languages with special letters.

MySQL database server is under UTF-8 Unicode (utf8)

the collation is utf8-general-ci

when, using mysqli, i get some data from the database :

while($row = $result->fetch_assoc()){
    foreach ($row as $field=>$value){
        print(mb_detect_encoding($value).' '.$value."<br/>");
    }
}

characters are encoded ASCII not UTF-8. Where does it come from ?

More infos : My Apache AddDefaultCharset is utf-8

the encoding charset for the html page is utf-8

I build the database with a script exported from another database which is utf-8 too

PS : I tried the mysqli_set_charset($mysqli, "utf8") but it does not change anything.

I really would like to know when and how the data is encoded ASCII ?

Thank you

PS 2 : this is the result I have with the mb_detect_encoding

ASCII ESSAI
ASCII 34
ASCII Bonjour
ASCII 41
UTF-8 ���������������

and the warning from DOMElement : Warning: DOMElement::setAttribute() [domelement.setattribute]: string is not in UTF-8

PS 3 : the problem is with the UTF-8 data.

In the database I have èèèèèèèèèèèèèèèèèèèèè

if I use utf8_encode around the string, I have no more problem and this as a result :

ASCII ESSAI
ASCII 34
ASCII Bonjour
ASCII 41
UTF-8 èèèèèèèèèèèèèèè
ASCII 43

So obviously my utf8 string is a utf8 string (see mb_detect_encoding) but the value has been changed somehow

14
  • 8
    ASCII is a subset of UTF-8, so an ASCII string is also a valid UTF-8 string. A UTF-8 encoded string that uses characters only in the ASCII set is also a valid ASCII string. Commented Sep 27, 2013 at 14:38
  • @cdhowie Unless it's an Extended ASCII - which is rarely the case nowadays. Commented Sep 27, 2013 at 14:39
  • 2
    @IdanArye Well, Extended ASCII is not ASCII (it's a superset), so I stand by my statement. Commented Sep 27, 2013 at 14:40
  • I use DOMElement class and I have a warning saying that my string is not a utf-8 encoded string. Obviously it does not think that ASCII string is a utf-8 string Commented Sep 27, 2013 at 14:41
  • 1
    @mlwacosmos Please var_dump() the string and add it to your question. Commented Sep 27, 2013 at 14:41

2 Answers 2

0

As said 7 bits ASCII is a subset of UTF-8, so "Bonjour" is detected as ASCII, "café, 3€" as UTF-8 (though you would see "caf" and ", 3").

Passing a variable filled from SQL immediately to the DOMElement (without utf8_encode) should work.

Sign up to request clarification or add additional context in comments.

5 Comments

you are right, this is what I thought too but obviously although the data has the good format in the database, when I get it through mysqli, it is not the same : small squares... this is why DOMElement cannot read it
Did you look in the database with some other tool? MySQLAdmin or such. Or simply a database dump (mysqldump); can be limited to a single table.
same with MySQL Workbench
And there the texts in the database are okay? So it is the reading, not the original writing. Then we have to wait for a more experienced answer.
it is like something reencode the character (in the driver maybe...) I should try with PDO
-1

I replaced mysqli with PDO

It works. The utf8 string is not changed...

So the problem is with mysqli (dont use that again)

@deceze : you can detect charset and it works good when everything is set right

2 Comments

mysqli works just as well as PDO for getting correctly encoded UTF-8 strings from the database, you were just missing some detail. Once strings are in PHP there's no difference between a string from PDO and mysqli or anywhere else.
It seems to be... at least by default because it works with PDO and not mysqli.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.