4

I'm using excel 2010 professional plus to create an excel file. Later on I'm trying to export it as a UTF-8 .csv file. I do this by saving it as CSV (symbol separated.....sry I know not the exact wording there but I don't have the english version and I fear it is translated differently than 1:1). There I click on tools->weboptions and select unicode (UTF-8) as encoding. The example .csv is as follows:

ID;englishName;germanName
1;Austria;Österreich

So far so good, but if I open the file now with my php code:

 header('Content-Type: text/html; charset=UTF-8');
 iconv_set_encoding("internal_encoding", "UTF-8");
 iconv_set_encoding("output_encoding", "UTF-8");
 setlocale(LC_ALL, 'de_DE.utf8');
 $fp=fopen($filePathName,'r');
 while (($dataRow= fgetcsv($fp,0,";",'"') )!==FALSE)
 {
     print_r($dataRow);
 }
  • I get: �sterreich as a result on the screen (as that is the "error" I cut all other parts of the result).
  • If I open the file with notedpad++ and look at the encoding I see "ANSI" instead of UTF-8.
  • If I change the encoding in notepad++ to UTF8....the ö,ä,... are replaced by special chars, which I have to correct manually.

If I go another route and create a new UTF-8 file with notedpad++ and put in the same data as in the excel file I get shown "Österreich" on screen when I open it with the php file.

Now the question I have is, why does it not function with excel, thus am I doing something wrong here? Or am I overlooking something?

Edit: As the program will in the end be installed on windows servers provided by customers, a solution is needed where it is not necessary to install additional tools (php libraries,... are ok, but having to install a vm-ware or cygwin,... is not). Also there won't be a excel (or office) locally installed on the server as the customer will be able to upload the .csv file via a file upload dialog (the dialog itself is not part of the problem, as I know how to handle those and additionally the problem itself I stumbled over when I created an excel file and converted it to .csv on a testmachine where excel was locally installed).

Tnx

5 Answers 5

11
+50

From PHP DOC

Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

You can try

header('Content-Type: text/html; charset=UTF-8');
$fp = fopen("log.txt", "r");
echo "<pre>";
while ( ($dataRow = fgetcsv($fp, 1000, ";")) !== FALSE ) {
    $dataRow = array_map("utf8_encode", $dataRow);
    print_r($dataRow);
}

Output

Array
(
    [0] => ID
    [1] => englishName
    [2] => germanName
)
Array
(
    [0] => 1
    [1] => Austria
    [2] => Österreich
)
Sign up to request clarification or add additional context in comments.

1 Comment

The array_map was the solution tnx. Didn't think that I would have to map it explecitely again as UTF-8 if I already did tell the program that it is UTF-8 what I'm reading. (interesting phenomenon.....in addition to excel exporting it still as ANSI ) Tnx again.
1

I don't know why Excel is generating a ANSI file instead of UTF-8 (as you can see in Notepad++), but if this is the case, you can convert the file using iconv:

iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_csv_file.csv > my_csv_file_utf8.csv

1 Comment

Is there another method to do that without relying on unix commands? (The application shall be able to run on quite a few different windows machines without the need to install additional tools, thus having to install a tool on those servers that allows me to use unix commands is not really something that is good).
1

And for the people from Czech republic:

function convert( $str ) {
    return iconv( "CP1250", "UTF-8", $str );
}
...
while (($data = fgetcsv($this->fhandle, 1000, ";")) !== FALSE) {
$data = array_map( "convert", $data );
...

Comments

0

The problem must be your file encoding, it looks it's not utf-8.

When I tried your example and double checked file that is indeed utf-8, it works for me, I get:

Array ( [0] => 1 [1] => Austria [2] => Österreich )

Use LibreOffice (OpenOffice), it's more reliable for these sort of things.

Comments

0

From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to Format->Encode in UTF-8 (without BOM) (rather than Format->Convert to UTF-8 (without BOM)).

And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.

To cut to the chase, I'd do something like that :

$f = fopen('file.csv', 'r');

while( ($row = fgets($f)) != null )
    if( mb_detect_encoding($row, 'UTF-8', true) !== false )
        var_dump(str_getcsv( $row, ';' ));
    else
        var_dump(str_getcsv( utf8_encode($row), ';' ));

fclose($f);

Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like mb_detect_encoding(file_get_contents(...), ...)

2 Comments

For excel. I checked the files with notepad++. Notepad++ told me they were still ANSI (even though exported as UTF8). And when I import UTF8 encoded files (created with notedpad++) into excel it is as if the wrong encoding was there (I get special chars instead of ö,ü,....).
@ThomasE. Regardless, this solution should work. Plus it should be able to cope with non-UTF8 input. It is not that different from Baba's since it applies utf8_encode to imported strings from your problematic csv files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.