1

I have a rather large csv file (17GB) which I'm trying to sanity check. I've written a little script which looks like this:

#!/usr/bin/php
<?php

$f = fopen($argv[1],'r');

$i=0;
while (!feof($f)) {
        $row = fgetcsv($f);
        $i++;
}
print $i."\n";

?>

Which should just read in the number of rows and print it out. This script outputs: 60770881

But if I do a wc -l the result is 60777200.

My csv file was generated from MySQL using:

INTO OUTFILE '/tmp/file.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n'

So it shouldn't have any unescaped newlines or anything like that. Does anyone have any idea what could be wrong?

2
  • 1
    Have you tried grepping the file for lines that do match ^" or "$ as that would indicate an unescaped newline. I am not certain that \\ is the right way to escape newlines for this purpose. Commented Jun 11, 2012 at 14:35
  • Thanks for that, pointed the way to finding the problem, which was what Aleks G described Commented Jun 12, 2012 at 19:10

1 Answer 1

4

CSV record can span multiple lines. If you have carriage-returns in any of the values, there will be multiple (2 or more) physical lines in the file (as counted by wc) but they would be read as one CSV record using fgetcsv.

Also, you don't need to check for feof($f), because fgetcsv will return FALSE on end-of-file.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.