0

I've inherited a MySQL database which contains a field named Description of type text and collation of latin1_swedish_ci.

The problem with this field is it contains utf-8 data with some Unicode characters, e.g. character 733, etc. Sometimes this character also exists in the field represented as HTML encoded "&#733" as well.

I'm trying to read the table and export the data to a CSV file and I need to represent this character as a double quote.

Reading the HTML encoded character is easy enough. However, it appears that the actual Unicode character is converted to utf-8 before I can do anything with it resulting in a "?".

How do I read in the Unicode character 733 (U+02DD), recognize it and convert it?

Here's a simplified (not tested) version of the code.

<?
$testconn=odbc_connect ("TESTLIB", "......", "......");

$query="SELECT Description FROM TestTable";

$rsWeb=mysql_query($query));

$WebRow=mysql_fetch_row($rsWeb));
$Desc = $WebRow[0];
$Desc = str_replace('"','""',$Desc);

fwrite($output,"\"".$Desc."\",\r\n");
%>
2
  • php.net/manual/en/function.html-entity-decode.php Commented Jan 16, 2012 at 15:27
  • I tried html_entity_decode(). However, the character has already been converted to a "?" before I get the chance to use html_entity_decode making it useless. It looks like it's converted to a "?" during either the mysql_fetch_row or mysql_query. Commented Jan 16, 2012 at 15:33

3 Answers 3

2

Also set charset to utf-8 when connecting to SQL server:

http://php.net/manual/en/mysqli.set-charset.php

$mysqli->set_charset("utf8");
Sign up to request clarification or add additional context in comments.

Comments

0

I think your connection charset is not utf8, that's why chars are being converted to '?'.

Read this: http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html

Post result for query: show variables like 'char%';

Comments

0

You really should put only non-entity (Unicode) version in the database, and entity-decode the rest. However, when you want to use UTF-8 with MySQL, there are a few things to remember:

  • Your table column's collation should be utf8_bin or similar.
  • Your table's collation and database collation should also be utf8_bin just in case.
  • Your connection charset should be UTF8. Do this by executing the "SET NAMES utf8" query.

Also, if you're outputting a HTML page, that should have the UTF8 charset as well. If everything is correct, the UTF8 characters should come out fine.

Good luck!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.