PHP parsing/typecasting problems

Question

What I'm trying to do is to convert some archived CSV data. It all worked well on a couple thousand files. I parse out a date and convert it to a timestamp. However on one file, somehow it doesn't work. I use (int) $string to cast the parsed strings to int values -> it returns int(0). I also used intval() -> same result. When I use var_dump($string), I get some weird output, for example string(9) "2008", which actually should be string(4) "2008". I tried to get to use preg_match on the string, without success. Is this an encoding problem?

Here is some code, it's just pretty standard stuff:

date_default_timezone_set('UTC');
$ms = 0;
function convert_csv($filename)
{
$target = "tmp.csv";
$fp = fopen("$filename","r") or die("Can't read the file!");
$fpo = fopen("$target","w") or die("Can't read the file!");
while($line = fgets($fp,1024))
{
    $linearr = explode(",","$line");

    $time = $linearr[2];
    $bid = $linearr[3];
    $ask = $linearr[4];
    $time = explode(" ",$time);
    $date = explode("-",$time[0]);
    $year = (int) $date[0]);
    $month =  (int)$date[1];
    $day = (int)$date[2];
    $time = explode(":",$time[1]);

    $hour = (int)$time[0];
    $minute = (int)$time[1];
    $second = (int)$time[2];
    $time = mktime($hour,$minute,$second,$month,$day,$year);

    if($ms >= 9)
    {
        $ms = 0;
    }else
    {
        $ms ++;
    }
    $time = $time.'00'.$ms;
    $newline = "$time,$ask,$bid,0,0\n";
    fwrite($fpo,$newline);

}
fclose($fp);
fclose($fpo);
unlink($filename);
rename($target,$filename);

}

Here is a link to the file we are talking about:

http://ratedata.gaincapital.com/2008/04%20April/EUR_USD_Week1.zip

A hex dump of the string(s) would certainly be a good idea, since the seemingly too-high string length indicates there are some bytes in there that your output viewer can't or won't show. — Another Code
– Another Code, Commented Mar 15, 2012 at 12:16

Another Code · Accepted Answer · 2012-03-15 12:31:22Z

2

The file seems to be encoded in UTF-16, so it is indeed an encoding problem. The string(9) is caused by the null-bytes that you get if UTF-16 is interpreted as a single-byte encoding.

This makes the file hard to read with functions like fgets, since they are binary-safe and thus not encoding aware. You could read the entire file in memory and perform an encoding conversion, but this is horribly inefficient.

I'm not sure if it's possible to read the file properly as UTF-16 using native PHP functions. You might need to write or use an external library.

edited Mar 15, 2012 at 12:31

answered Mar 15, 2012 at 12:26

Another Code

3,14123 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fran Marzoa · Accepted Answer · 2012-03-15 13:14:56Z

0

You may try to convert your file to plan ascii using iconv.

If you are on a linux or similar system that has iconv command:

$ iconv -f UTF16 -t ASCII EUR_USD_Week1.csv > clean.csv

Otherwise you may found the PHP iconv function useful:

http://php.net/manual/en/function.iconv.php

edited Mar 15, 2012 at 13:14

answered Mar 15, 2012 at 13:02

Fran Marzoa

4,5471 gold badge41 silver badges61 bronze badges

Collectives™ on Stack Overflow

PHP parsing/typecasting problems

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related