1

I'm tying to extract data from thousands of premade sql files. I have a script that does what I need using the Mysqli driver in PHP, but it's really slow since it's one sql file at a time. I modified the script to create unique temp database names, which each sql file is loaded into. Data is extracted to an archive database table, then the temp database is dumped. In an effort to speed things up, I created a script structured 4 scripts similar to the one below, where each for loop is stored in it's own unique PHP file (the code below is only for a quick demo of what's going on in 4 separate files), they are setup to grab only 1/4 of the files from the source file folder. All of this works perfectly, the scripts run, there is zero interference with file handling. The issue is that I seem to get almost zero performance boost. Maybe 10 seconds faster :( I quickly refreshed my PHPmyadmin database listing page and could see the 4 different databases loaded at anytime, but I also noticed that it looked like it was still running more or less sequentially as the DB names were changing on the fly. I went the extra step of creating an unique user for each script with it's own connection. No improvement. Can I get this to work with mysqli / PHP or do I need to look into some other options? I'd prefer to do this all in PHP if I can (version 7.0). I tested by running the PHP scripts in my browser. Is that the issue? I haven't written any code to execute them on the command line and set them to the background yet. One last note, all the users in my mysql database have no limits on connections, etc.

$numbers = array('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20');


$numCount = count($numbers);
$a = '0';
$b = '1';
$c = '2';
$d = '3';

$rebuild = array();

echo"<br>";

for($a; $a <= $numCount; $a+=4){

if(array_key_exists($a, $numbers)){
    echo $numbers[$a]."<br>";
}

}

echo "<br>";

for($b; $b <= $numCount; $b+=4){

if(array_key_exists($b, $numbers)){
    echo $numbers[$b]."<br>";
}
}

echo "<br>";

for($c; $c <= $numCount; $c+=4){

if(array_key_exists($c, $numbers)){
    echo $numbers[$c]."<br>";
}

}

echo "<br>";

for($d; $d <= $numCount; $d+=4){

if(array_key_exists($d, $numbers)){
    echo $numbers[$d]."<br>";
}

}
9
  • Well... your code runs one loop after another, they don't run in parallel. In other words, the second for loop won't run until everything before is finished. For threading, you need to subclass the Thread class. Read this: mullie.eu/parallel-processing-multi-tasking-php Commented Oct 17, 2016 at 1:21
  • My bad... each for loop is it's own file... I just formatted that for a quick display of the idea. Commented Oct 17, 2016 at 1:22
  • Oh ok, I get it. But anyway, for parallel processing, you should ("must") use multiple threads Commented Oct 17, 2016 at 1:23
  • OK, so am I on the right path though? I have four files that won't mess with each others file handling etc... ? Commented Oct 17, 2016 at 1:24
  • That's one way, but when in production state, that will be extremely wasteful. Using 5 connections for 4 parallel tasks (parent + 4 children) is not the way to do it, instead you use one "parent" process which runs several "child" processes without consuming more network calls. That's a task for threading. Some people use curl() or fopen() to load those child scripts... even if that works, the correct and accepted way to do it is using a Thread subclass. Commented Oct 17, 2016 at 1:27

1 Answer 1

4

Try this:

<?php
    class BackgroundTask extends Thread {
        public $output;
        protected $input;

        public function run() {
            /* Processing here, use $output for... well... outputting data */

            // Here you would implement your for() loops, for example, using $this->input as their data

            // Some dumb value to demonstrate
            $output = "SOME DATA!";
        }

        function __construct($input_data) {
            $this->input = $input_data;
        }
    }

    // Create instances with different input data
    // Each "quarter" will be a quarter of your data, as you're trying to do right now
    $job1 = new BackgroundTask($first_quarter);
    $job1->start();

    $job2 = new BackgroundTask($second_quarter);
    $job2->start();

    $job3 = new BackgroundTask($third_quarter);
    $job3->start();

    $job4 = new BackgroundTask($fourth_quarter);
    $job4->start();

    // ==================

    // "join" the first job, i.e. "wait until it's finished"
    $job1->join();
    echo "First output: " . $job1->output;

    $job2->join();
    echo "Second output: " . $job2->output;

    $job3->join();
    echo "Third output: " . $job3->output;

    $job4->join();
    echo "Fourth output: " . $job4->output;
?>

When using four calls to your own script through HTTP, you're overloading your connections for no useful reason. Instead, you're taking away spaces for other users who may be trying to access your website.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the code! Where is the thread class that's being extended?
Look at the first line, extends Thread. That includes all the code to work with threads.
I just saw this in the PHP manual, didn't realize it was built into pthreads extension. Ok, so is my original concept ok, with 4 different scripts, or could I just use my original that loops over all the files and duplicate it several times? My original design was to prevent file handling issues since it also deleted the sql file when it was done.
Well, you could use four different scripts (for example, using include() inside the run() method) and all should be good. But remember that every process run in parallel, so you should check the output data for each one.
Ok, so I could use all four, I guess this is where I'm confused a bit. Say I put in my original master script, that did each file one at a time, into the run() method. Would this still boost the performance? In my mind, I imagine it will just run like usual and not do anything special... correct?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.