0

What I'm trying to do is to convert some archived CSV data. It all worked well on a couple thousand files. I parse out a date and convert it to a timestamp. However on one file, somehow it doesn't work. I use (int) $string to cast the parsed strings to int values -> it returns int(0). I also used intval() -> same result. When I use var_dump($string), I get some weird output, for example string(9) "2008", which actually should be string(4) "2008". I tried to get to use preg_match on the string, without success. Is this an encoding problem?

Here is some code, it's just pretty standard stuff:

date_default_timezone_set('UTC');
$ms = 0;
function convert_csv($filename)
{
$target = "tmp.csv";
$fp = fopen("$filename","r") or die("Can't read the file!");
$fpo = fopen("$target","w") or die("Can't read the file!");
while($line = fgets($fp,1024))
{
    $linearr = explode(",","$line");

    $time = $linearr[2];
    $bid = $linearr[3];
    $ask = $linearr[4];
    $time = explode(" ",$time);
    $date = explode("-",$time[0]);
    $year = (int) $date[0]);
    $month =  (int)$date[1];
    $day = (int)$date[2];
    $time = explode(":",$time[1]);

    $hour = (int)$time[0];
    $minute = (int)$time[1];
    $second = (int)$time[2];
    $time = mktime($hour,$minute,$second,$month,$day,$year);

    if($ms >= 9)
    {
        $ms = 0;
    }else
    {
        $ms ++;
    }
    $time = $time.'00'.$ms;
    $newline = "$time,$ask,$bid,0,0\n";
    fwrite($fpo,$newline);

}
fclose($fp);
fclose($fpo);
unlink($filename);
rename($target,$filename);

}

Here is a link to the file we are talking about:

2
  • 1
    Please show us some code. Also you got string(9) "2008" ? Commented Mar 15, 2012 at 12:13
  • A hex dump of the string(s) would certainly be a good idea, since the seemingly too-high string length indicates there are some bytes in there that your output viewer can't or won't show. Commented Mar 15, 2012 at 12:16

2 Answers 2

2

The file seems to be encoded in UTF-16, so it is indeed an encoding problem. The string(9) is caused by the null-bytes that you get if UTF-16 is interpreted as a single-byte encoding.

This makes the file hard to read with functions like fgets, since they are binary-safe and thus not encoding aware. You could read the entire file in memory and perform an encoding conversion, but this is horribly inefficient.

I'm not sure if it's possible to read the file properly as UTF-16 using native PHP functions. You might need to write or use an external library.

Sign up to request clarification or add additional context in comments.

Comments

0

You may try to convert your file to plan ascii using iconv.

If you are on a linux or similar system that has iconv command:

$ iconv -f UTF16 -t ASCII EUR_USD_Week1.csv > clean.csv

Otherwise you may found the PHP iconv function useful:

http://php.net/manual/en/function.iconv.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.