0

I wrote this code that allows to read a folder, depending on the file name is inserted in a different table the data of the csv file. Once the file is processed it is moved to another folder. My code works well but the second csv file is more than 80 000 lines long and it takes several hours to be integrated in my database.

How can I improve the performance of my code? I tried 'LOAD DATA LOCAL' but without success...

<?php

include("../connexion.php");

ini_set('max_execution_time', 54000);
$timestamp= date("y-m-d H:i");

$dir   = 'D:/xampp/htdocs/retail_BI/test/';
$allFiles = scandir($dir);
$dest = 'D:/xampp/htdocs/retail_BI/test/files/';

   foreach($allFiles as $file) {

        if (!in_array($file,array(".","..")))
      { 
  
        $file = $dir.$file;
        $filename = basename( $file );
        
       if ( strpos( $filename, 'BI1_' ) === 0 ) {
        if (($handle = fopen("$filename", "r")) !== false) {
            
            //To remove BOM in the fist cell
              fseek($handle, 3);
            
            while (($data = fgetcsv($handle, 9000000, ";")) !== false) {
                if (empty(array_filter($data))) {
                    echo "not good";
                continue;
                }
 
            $date = DateTime::createFromFormat('d/m/Y H:i:s A', $data[2]);
            if ($date === false) {
            break;
            }

            $date1 = $date->format('Y-m-d'); // 2020-07-07
            $date2 = $date->format('Hi A'); // 1247 AM
            
              //database entry               
              $query = "insert into dbo.Y2_Sales (storenumber, storename, date, time, TransRef, stylecode, color, size, quantity, unit_price, SalesExGST, cost, currency) 
               values('$data[0]', '$data[1]','$date1','$date2','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]', '$data[8]','$data[9]','$data[10]','$data[11]')";
                   $stmt = $conn->query( $query );
                     if (!$stmt) { echo $conn->error;}                                          
                }       
            }
                fclose($handle);
                
            //Moving the file to another folder             
            if(!rename($file, $dest . $filename)) { 
                echo "error copy";
                } 
                    
           } else if ( strpos( $filename, 'BI2_' ) === 0 ) {
    
              if (($handle = fopen("$filename", "r")) !== false) {
                  
                 //To remove BOM in the fist cell
                  fseek($handle, 3);
            
            while (($data = fgetcsv($handle, 9000000, ";")) !== false) {        
                 // print_r($data);
                if (empty(array_filter($data))) {
                continue;
                }
                 //database entry    
               $query = "insert into dbo.Y2_Inventory (storenumber, stylecode, colour, size, units, timestamp) 
               values('$data[0]', '$data[1]','$data[2]','$data[3]','$data[4]', '$timestamp')";
                   $stmt = $conn->query( $query );
                   if (!$stmt) { echo $conn->error;}
                               
                        // $query = "LOAD DATA LOCAL INFILE '$filename' INTO TABLE dbo.Y2_Inventory FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' (storenumber, stylecode, colour, size, units, timestamp)";
                         // $stmt = $conn->query( $query );
            }               
         }
            fclose($handle);

            //Moving the file to another folder             
            if(!rename($file, $dest . $filename)) { 
                            echo "error copy";
                            } 

       } 
    }
  }
  
    echo "good !";

?>
0

1 Answer 1

2

Instead of inserting data into database for every row, try inserting in batches.

You can always do a bulk insert, that can take n(use 1000) number of entries and insert it into the table.

https://www.mysqltutorial.org/mysql-insert-multiple-rows/

This will result in reduction of the DB calls, thereby reducing the overall time.

And for 80k entries there is a possibility that you might exceed the memory limit too.

You can overcome that using generators in php. https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b

Although, this is in Laravel, but the code that reads from csv is independent (the one that uses generator) and the logic can be used here.

Sign up to request clarification or add additional context in comments.

7 Comments

I'm going to look for the sql part and to exceed the memory limit I added 'ini_set('max_execution_time', 54000);' at the beginning to let it run for several hours
In the example mysql they add lines by knowing the values, me I must pass on all the lines of my csv @Aashish gaba
'ini_set('max_execution_time', 54000)' , this is for the time execution limit, it's not for memory.
Still, there is a possibility that the data might be more than 64M, and anyways, in your case if its not, then no need. I just added it because it uses roughly fixed memory, same as required for 1000 entries, it's just a way to improve the memory efficiency
All right, thanks. I'll see what I can do to make 1000 entries at once
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.