2

I'm facing a weird problem with inserting hebrew text into mysql.

Basically the problem is :
I have a php script which picks up hebrew text from a csv file then send it to mysql database. The charset of both database and all fields of tables are set to UTF8 and collation to utf8_bin. But when I insert it using mysql, random garbage value appears inside the text which renders it completely useless for output. NOTE : I can still see half of the words appear correctly.

Here is my homework which might help you in understanding :
1. As I mentioned the table charset and collation are utf8.
2. I've send header('Content-Type: text/html; charset=utf-8')
3. If I echo out the text, it appears perfectly. When I convert it using utf-8_encode it get converted properly. (eg. שי יפת get converted to ×©× ×פת)
4. When I use utf-8_decode on the converted variable and use echo, it still displays perfectly.
5. I've used these after mysql_connect

mysql_query("SET character_set_client = 'utf8';");
mysql_query("SET character_set_result = 'utf8';");
mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');

and even tried this :
mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con)

  1. Added default_charset = "UTF-8" in my php.ini file.
  2. I am unaware of the encoding used in csv file but when I open it with notepad++ the encoding is utf-8 without BOM.
  3. Here is a sample of the actual garbage :
    original text : שי יפת
    text after utf8_encode : ×©× ×פת
    text after utf8_decode in same script : שי יפת (perfect)
    text send to mysql database : ש×? ×?פת (notice the ? in between)
    text if we echo from mysql : ש�? �?פת (the output is close)
  4. Used addslashes and stripslashes before utf8_encoding. (even tried after no luck)
  5. Server is on windows running xamp 1.7.4
    • Apache 2.2.17
    • MySQL 5.5.8 (Community Server)
    • PHP 5.3.5 (VC6 X86 32bit)

EDIT 1 : Just to clarify that I did searched the site for similar questions and did implemented the suggestions found (SET NAME UTF8 and alot other options etc) but it didn't work out. So please don't mark this question as repeat.

EDIT 2 : Here is the full script :

    <?php
header('Content-Type: text/html; charset=utf-8'); 

if (isset($_GET['filename'])==true)
{
$databasehost = "localhost";
$databasename = "what_csv";


$databaseusername="root";
$databasepassword="";
$databasename= "csv";

$fieldseparator = "\n";
$lineseparator = "@contact\n";


$csvfile = $_GET['filename'];
/********************************/


if(!file_exists($csvfile)) {
    echo "File not found. Make sure you specified the correct path.\n";
    exit;
}

$file = fopen($csvfile,"r");

if(!$file) {
    echo "Error opening data file.\n";
    exit;
}

$size = filesize($csvfile);

if(!$size) {
    echo "File is empty.\n";
    exit;
}

$csvcontent = fread($file,$size);

fclose($file);

$con = @mysql_connect($databasehost,$databaseusername,$databasepassword) or die(mysql_error());

mysql_query( "SET NAMES utf8" );
mysql_set_charset('utf8',$con);
/*
mysql_query("SET character_set_client = 'utf8';"); 
mysql_query("SET character_set_result = 'utf8';");

mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con);
*/

@mysql_select_db($databasename) or die(mysql_error());



$lines = 0;
$queries = "";
$linearray = array();

foreach(explode($lineseparator,$csvcontent) as $line) {

$Name="";
$Landline1="";
$Landline2="";
$Mobile="";
$Address="";
$Email="";
$IMEI="temp";
$got_imei=false;

//echo $line.'<br>';
    $lines++;

    $line = trim($line," \t");

    $line = str_replace("\r","",$line);

    $linearray = explode($fieldseparator,$line);
    //check for values to insert
    foreach($linearray as $field)
    {
    if (is_numeric($field)){ $got_imei=true;$IMEI=trim($field);}
    if (stristr($field, 'Name:')) {$Name=trim(str_replace("Name:", "", $field));}   
    if (stristr($field, 'Landline:')) {$Landline1=trim(str_replace("Landline:", "", $field));}  
    if (stristr($field, 'Landline2:')) {$Landline2=trim(str_replace("Landline2:", "", $field));}    
    if (stristr($field, 'Mobile:')) {$Mobile=trim(str_replace("Mobile:", "", $field));} 
    if (stristr($field, 'Address:')) {$Address=trim(str_replace("Address:", "", $field));}
    if (stristr($field, 'Email:')) {$Email=trim(str_replace("Email:", "", $field));}



    }
    if ($got_imei==true)
    {

    $query = "UPDATE $databasetable SET imei=$IMEI where imei='temp'";
        mysql_query($query);

    }



    else if (($Name=="") &&  ($Landline1=="" ) && ($Landline2=="")  && ($Mobile=="")  && ($Address=="")) {echo "";}
    else
    {
        //$Name = utf8_encode("$Name");
        //$Name = addslashes("$Name");
        $Name = utf8_encode(mysql_real_escape_string("$Name"));

        echo"$Name,$Landline1,$Landline2,$Address,$IMEI<br>";
        $query = "insert into $databasetable (imei, name, landline1, landline2, mobile, address, email) values('$IMEI','$Name', '$Landline1','$Landline2','$Mobile', '$Address', '$Email');";
        mysql_query($query);
        $Name = utf8_decode(($Name));   
        echo $Name."<br>";

    }
}
@mysql_close($con);



echo "Found a total of $lines records in this csv file.\n";

}
?>


<form>
Enter file name <input type="text" name="filename" /><br />
<input type="submit" value="Submit" /><br>
NOTE : File must be present in same directory as this script. Please include full filename, for example filename.csv.
</form>

Here is a sample of csv file :

@contact
Name: שי יפת
Mobile: 0547939898

@IMEI
355310042074173

EDIT 3 :

If I directly enter the string via cmd I get this warning:

Warning Code : 1366
Incorrect string value: '\xD7\xA9\xD7\x99 \xD7...' for column 'name' at row 1

Here is something I found on the net that could be related, any help? http://bugs.mysql.com/bug.php?id=30131

2
  • 1
    i use utf8_general_ci, maybe it'll work better for you. utf8 without BOM is fine btw. use mysql_real_escape_string instead of addslashes Commented Oct 2, 2011 at 4:41
  • Thank for the quick response but I forgot to mentioned I've tried both, mysql_real_escape_string and changing and changing the collation to utf_general_ci, no luck. Commented Oct 2, 2011 at 5:08

2 Answers 2

2

I had this problem too. Thees lines solve it:

mysql_query( "SET NAMES utf8" );
mysql_query( "SET CHARACTER SET utf8" );

Shana Tova

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, still getting those ? in the mysql table.
does the php file that save the csv to the db is also encoded in utf8 without BOM?
Hi, it was encoded in ANSI just created a copy of that file in utf, still same garbage entries!
sorry, i'm out of ideas... try to work with mb_detect_encoding and mb_convert_encoding
Hi I updated the question, could you please read the EDIT 2 and EDIT 3?
1

Use Text/LongText instead of varchar. Also use Collation as utf8_general_ci

Hope this will help you @Ajit

1 Comment

Abit late but setting to Text did solved this problem ( although not a proper solution since I did wanted varchar type for searching etc).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.