2

I'm having a trouble when tried to use array_combine in a foreach loop. It will end up with an error:

PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 85 bytes) in

Here is my code:

$data = array();
$csvData = $this->getData($file);
if ($columnNames) {
    $columns = array_shift($csvData);
    foreach ($csvData as $keyIndex => $rowData) {
        $data[$keyIndex] = array_combine($columns, array_values($rowData));
    }
}

return $data;

The source file CSV which I've used has approx ~1,000,000 rows. This row

$csvData = $this->getData($file)

I was using a while loop to read CSV and assign it into an array, it's working without any problem. The trouble come from array_combine and foreach loop.

Do you have any idea to resolve this or simply have a better solution?

UPDATED

Here is the code to read the CSV file (using while loop)

$data = array();
if (!file_exists($file)) {
    throw new Exception('File "' . $file . '" do not exists');
}

$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
    $data[] = $rowData;
}
fclose($fh);
return $data;

UPDATED 2

The code above is working without any problem if you are playing around with a CSV file <=20,000~30,000 rows. From 50,000 rows and up, the memory will be exhausted.

4
  • Does $this->getData($file) only read the raw file? Commented May 20, 2016 at 8:36
  • @RomanPerekhrest: Yes. I added that method into the question. Commented May 20, 2016 at 8:37
  • are you sure that the error occures within foreach loop but not on $this->getData($file) operation? Commented May 20, 2016 at 8:41
  • @RomanPerekhrest: I am certain, because the error logged show me that it come from array_combine method in foreach loop. And $csvData is carrying the correct data. Commented May 20, 2016 at 8:48

1 Answer 1

4

You're in fact keeping (or trying to keep) two distinct copies of the whole dataset in your memory. First you load the whole CSV date into memory using getData() and the you copy the data into the $data array by looping over the data in memory and creating a new array.

You should use stream based reading when loading the CSV data to keep just one data set in memory. If you're on PHP 5.5+ (which you definitely should by the way) this is a simple as changing your getData method to look like that:

protected function getData($file) {
    if (!file_exists($file)) {
        throw new Exception('File "' . $file . '" do not exists');
    }

    $fh = fopen($file, 'r');
    while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
        yield $rowData;
    }
    fclose($fh);
}

This makes use of a so-called generator which is a PHP >= 5.5 feature. The rest of your code should continue to work as the inner workings of getData should be transparent to the calling code (only half of the truth).

UPDATE to explain how extracting the column headers will work now.

$data = array();
$csvData = $this->getData($file);
if ($columnNames) { // don't know what this one does exactly
    $columns = null;
    foreach ($csvData as $keyIndex => $rowData) {
        if ($keyIndex === 0) {
            $columns = $rowData;
        } else {
            $data[$keyIndex/* -1 if you need 0-index */] = array_combine(
                $columns, 
                array_values($rowData)
            );
        }
    }
}

return $data;
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your response, but what exactly yield doing in this game?
yield is a little bit more complicated than what can be described here in a comment. You should definitely read php.net/manual/en/language.generators.overview.php, blog.ircmaxell.com/2012/07/what-generators-can-do-for-you.html and stackoverflow.com/questions/17483806/…
There is a small issue, I'm using this $columns = array_shift($csvData); to move CSV column names into an array, and array_combine will use that new array to the source array (from csv). How to force yield return an array instead of an object?
You need to do that a little bit different. yield will simply return row after row. If you need to handle row 0 (the first row containing the headers) differently, you need to check the $keyIndex. if $keyIndex === 0 then you extract the columns, if $keyIndex > 0 you continue your normal handling of data rows.
Could you update your answer with this, please? It still confusing me, first time I heard about yield and Generators :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.