7

I have a big array of associative arrays. Each associative array consists of around 15 keys of different types (string, integer, float) - Smaller example below:

$array = [
    [
      "key1" => "string",
      "key2" => 10,
      "key3" => 4.05
    ],       
    [
      "key1" => "string2",
      "key2" => 20,
      "key3" => 1.05
    ],       
   ...
];

Now I want to iterate over this array and add some keys like

$map = array_map(function (array $item) {            
       $item['key4'] = 1;
       $item['key5'] = 1;
       $item['key6'] = 1;
       return $item;
   }, $array);

Problem: For an array which contains quite a big number of associative arrays, adding new keys makes that there is memory limit reached and the script is terminated. Do you have any solutions?

7
  • 8
    Protip: start using objects with classes, and back those with a database if you really have so much data to work with that you run out of memory. That's how storage works: cpu cache, if you run out of that, RAM, if you run out of that, time to use the filesystem. Commented Oct 2, 2017 at 19:53
  • If you have enough memory you can always increase the memory-limit - php.net/manual/en/ini.core.php#ini.memory-limit Commented Oct 2, 2017 at 20:00
  • @Mike'Pomax'Kamermans Thanks for reply. You mean that I should use array of objects, is that right ? Commented Oct 2, 2017 at 20:06
  • @daker Increasing memory limit is not a solution in my case - and mostly in general. I can have an array containing once 200K arrays and then 800 K - I have to solve it on the code level. Commented Oct 2, 2017 at 20:07
  • 1
    @Joe yes, start by representing your data with objects, rather than associative arrays, and then you can load/save batches of objects using quite a few disk-based solutions. A simple object store running on your data, for instance, would make working with millions of objects quite manageable. Commented Oct 2, 2017 at 21:57

2 Answers 2

4

You could paginate your data, chunk your array to work with smaller pieces, or even increase memory_limit, but let's assume that you have a big array and can't do otherwise.

So let’s play with a 1 000 000 long array and try different solutions. I'll put the memory consumption & compute time measurements from my laptop

Current solution (857MB / 640ms)

for ($i=0; $i< 1000000; $i++){
    $array[$i] = [
        "key" => 'value',
        "key2" => $i,
        "key3" => $i / 3
    ];
}

$map = array_map(function (array $item) {
    $item['key4'] = 1;
    $item['key5'] = 1;
    $item['key6'] = 1;
    return $item;
}, $array);

With this piece of code the memory consumption on my laptop is 857MB and the compute time 640ms.

In your example you are creating a whole new $map variable from your $array. This means you are making a fresh copy of the array in memory.

Working with references (480MB / 220ms)

$array = [];
for ($i=0; $i< 1000000; $i++){
    $array[$i] = [
        "key" => 'value',
        "key2" => $i,
        "key3" => $i / 3
    ];
}

foreach ($array as &$item) {
    $item['key4'] = 1;
    $item['key5'] = 1;
    $item['key6'] = 1;
}

With the usage of &$item we asking PHP to give us access to the variable by reference, meaning that we are modifying the data directly in-memory without creating a new copy of it.

This is why it this script consumes a lot less memory & compute time.

Working with classes (223MB / 95ms)

Under the hood, PHP uses C data structures to manage data in memory. Classes are predictable and much easier for PHP to optimize than an array. It is well explained here

class TestClass {
    public $key1,   $key2, $key3,   $key4, $key5, $key6;
}

$array = [];
for ($i=0; $i< 1000000; $i++){
    $array[$i] = new TestClass();
    $array[$i]->key1 = 'value';
    $array[$i]->key2 = $i;
    $array[$i]->key3 = $i / 3;
}

foreach ($array as $item) {
    $item->key4 = 1;
    $item->key5 = 1;
    $item->key6 = 1;
}

You can see that the memory consumption & the time to iterate are much lower. This is because PHP don't need to modify the structure of the data in memory : every field of the object is ready to receive data.

Be careful, though, if you add a field that wasn't declared in the class (eg. $item->newKey = 1 : newKey is declared dynamically) : memory optimisation won't be possible anymore and you'll jump to 620mb memory usage & 280ms compute)


If you want to go further and are not afraid of headaches, take a look to the Standard PHP Library (SPL) : you will find a lot of optimized data structures solutions (Fixed Arrays, Iterators & so on...)

PS : benchmark made with Xdebug disabled

Sign up to request clarification or add additional context in comments.

Comments

-1

You should be able to save some memory if you utilize references. This will remove many copy on write actions that occur the background. In a small test case. I was able to reduce the memory by 30%-40% (depending on the PHP version). If you use PHP5, you can also profit from an upgrade to PHP7. Obviously I can't predict if either or both save enough memory. Test Case (just remove the /* before map or walk):

$cnt=10000;
for($i=0;$i<$cnt;$i++) {
  $array[]['key1'] = 1;
  $array[]['key2'] = 2;
  $array[]['key3'] = 3;
}
/*array_walk($array, function (&$item,$key) {            
   $item['key4'] = 1;
   $item['key5'] = 1;
   $item['key6'] = 1;
}); //memory used PHP7/PHP5: 13 437 720 - 25 924 944 */
/*$map = array_map(function (array $item) {            
   $item['key4'] = 1;
   $item['key5'] = 1;
   $item['key6'] = 1;
   return $item;
}, $array); //memory used PHP7/PHP5: 25 050 360 - 40 850 480*/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.