1

I found a function here: http://archevery.blogspot.com/2013/07/php-curl-multi-threading.html

I am using it to send an array of URLs to run and process as quickly as possible via Multi-threaded curl requests. This works great.

SOME of the urls I want to send it require they be processed in order, not at the same time, but in a sequence.

How can I achieve this?

Example:

URL-A URL-B URL-C --> All fire off at the same time

URL-D URL-E --> Must wait for URL-D to finish before URL-E is triggered.

My purpose is for a task management system that allows me to add PHP applications as "Tasks" in the database. I have a header/detail relationship with the tasks so a task with one header and one detail can be sent off multi-threaded, but a task with one header and multiple details must be sent off in the order of the detail tasks.

I can do this by calling curl requests in a loop, but I want them to also fire off the base request (the first task of a sequence) as part of the multi-threaded function. I dont want to have to wait for all sequential tasks to pile up and process in order. As in the first task of each sequence should be multi-threaded, but tasks with a sequence then need to wait for that task to complete before moving to the next.

I tried this function that I send the multiple tasks to, but it waits for each task to finish before moving on the next. I need to somehow combine the multi-threaded function from the URL above with this one. Here is my multithreaded curl function:

function runRequests($url_array, $thread_width = 10) {
    $threads = 0;
    $master = curl_multi_init();
    $curl_opts = array(CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_MAXREDIRS => 5,
        CURLOPT_CONNECTTIMEOUT => 15,
        CURLOPT_TIMEOUT => 15,
        CURLOPT_RETURNTRANSFER => TRUE);
    $results = array();
    $count = 0;
    foreach($url_array as $url) {
        $ch = curl_init();
        $curl_opts = [CURLOPT_URL => $url];
        curl_setopt_array($ch, $curl_opts);
        curl_multi_add_handle($master, $ch); //push URL for single rec send into curl stack
        $results[$count] = array("url" => $url, "handle" => $ch);
        $threads++;
        $count++;
        if($threads >= $thread_width) { //start running when stack is full to width
            while($threads >= $thread_width) {
                //usleep(100);
                while(($execrun = curl_multi_exec($master, $running)) === -1){}
                curl_multi_select($master);
                // a request was just completed - find out which one and remove it from stack
                while($done = curl_multi_info_read($master)) {
                    foreach($results as &$res) {
                        if($res['handle'] == $done['handle']) {
                            $res['result'] = curl_multi_getcontent($done['handle']);
                        }
                    }
                    curl_multi_remove_handle($master, $done['handle']);
                    curl_close($done['handle']);
                    $threads--;
                }
            }
        }
    }
    do { //finish sending remaining queue items when all have been added to curl
        //usleep(100);
        while(($execrun = curl_multi_exec($master, $running)) === -1){}
        curl_multi_select($master);
        while($done = curl_multi_info_read($master)) {
            foreach($results as &$res) {
                if($res['handle'] == $done['handle']) {
                    $res['result'] = curl_multi_getcontent($done['handle']);
                }
            }
            curl_multi_remove_handle($master, $done['handle']);
            curl_close($done['handle']);
            $threads--;
        }
    } while($running > 0);
    curl_multi_close($master);
    return $results;
}

and here is single-threaded curl function.

function runSingleRequests($url_array) {
foreach($url_array as $url) {   

// Initialize a CURL session. 
$ch = curl_init();  

// Page contents not needed. 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); 

// grab URL and pass it to the variable. 
curl_setopt($ch, CURLOPT_URL, $url); 

// process the request.  
$result = curl_exec($ch);

    }

Both take an array of URLs as their input.

I currently have an array of all single tasks and another array of all multiple tasks with a "header id" that lets me know what header task each detail task is part of.

Any help on theory or code would be most appreciated. Thanks!

5
  • 1
    What did you try? Commented Jan 9, 2020 at 13:37
  • Hi @Dilek Ok I added my function that works, but doesn't multi-thread at all. Commented Jan 9, 2020 at 13:44
  • You have full control about when you queue a request. Just don't queue those requests that shouldn't be sent yet. To me, that seems obvious, so I really wonder what your question actually is. Concerning your code, one thing to watch out for is error handling, too. Some of these function can fail and you need to check their returnvalue. Commented Jan 9, 2020 at 13:52
  • You are right sir, I guess I do not understand how I can send a URL to the multithread curl function and know when its completed before triggering the next one in the sequence. Commented Jan 9, 2020 at 14:06
  • @Yourguide First of all the codes you showed us been used for cakePHP from years ago as I know, So, I am not sure it will still work on modern browsers and changing alot of required functions like sleep() or usleep() to another functions and some others, which is need to be rewrite and its over my knowledge.The anser of your question is most up to date, But need knowledge for usaqe, Alternate you can use : ParallelCurl here which is very simple : github.com/petewarden/ParallelCurl FOR UPDATED AND FIXED Here : github.com/marcushat/RollingCurlX/tree/master/src Commented Jan 9, 2020 at 15:34

2 Answers 2

1

Why don't you use a rudementary task scheduler to schedule your requests and followups, instead of running everything at once?

See it in action: https://ideone.com/suTUBS

<?php
class Task 
{
    protected $follow_up = [];
    protected $task_callback;

    public function __construct($task_callback) 
    {
        $this->task_callback = $task_callback;
    }

    public function addFollowUp(Task $follow_up) 
    {
        $this->follow_up[] = $follow_up;
    }

    public function complete() 
    {
        foreach($this->follow_up as $runnable) {
            $runnable->run();
        }
    }

    public function run() 
    {
        $callback = $this->task_callback;

        $callback($this);
    }
}



$provided_task_scheduler_from_somewhere = function() 
{
    $tasks = [];

    $global_message_thing = 'failed';

    $second_global_message_thing = 'failed';

    $task1 = new Task(function (Task $runner) 
    {
        $something_in_closure = function() use ($runner) {
            echo "running task one\n";
            $runner->complete();
        };
        $something_in_closure();
    });

    /**
     * use $global_message_thing as reference so we can manipulate it
     * This will make sure that the follow up on this one knows the status of what happened here
     */
    $second_follow_up = new Task(function(Task $runner) use (&$global_message_thing)
    { 
        echo "second follow up on task one.\n";
        $global_message_thing = "success";
        $runner->complete();
    });

    /**
     * Just doing things in random order to show that order doesn't really matter with a task scheduler
     * just the follow ups
     */
    $tasks[] = $task1;

    $tasks[] = new Task(function(Task $runner) 
    {
        echo "running task 2\n";
        $runner->complete();
    });

    $task1->addFollowUp(new Task(function(Task $runner) 
    { 
        echo "follow up on task one.\n";
        $runner->complete();
    }));

    $task1->addFollowUp($second_follow_up);

    /**
     * Adding the references to our "status" trackers here to know what to print
     * One will still be on failed because we did nothing with it. this way we know it works properly
     * as a control.
     */
    $second_follow_up->addFollowUp(new Task(function(Task $runner) use (&$global_message_thing, &$second_global_message_thing) {
        if($global_message_thing === "success") {
            echo "follow up on the second follow up, three layers now, w00007!\n";
        }
        if($second_global_message_thing === "success") {
            echo "you don't see this\n";
        }
        $runner->complete();
    }));
    return $tasks;
};
/**
 * Normally you'd use some aggretating function to build up your tasks
 * list or a collection of classes. I simulated that here with this callback function.
 */
$tasks = $provided_task_scheduler_from_somewhere();

foreach($tasks as $task) {
    $task->run();
}

This way you can have nesting of tasks that need to follow after each other, with some clever uses of closures you can pass parameters to the executing functions and the encompassing objects outside it.

In my example the Task object itself is passed to the executing function so the executing function can call complete when it's done with it's job.
When complete is called the Task determine if it has scheduled follow up tasks to execute and if so, those are automatically called and works itself down the chain like that.

It's a rudimentary task scheduler, but it should help you on the way getting steps planned in the order you want them to be executed.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your detailed response. Im just getting started in OOP and classes... I have no idea what you wrote above actually does. I will do some reading and get back to your response. Thanks!
@Yourguide Just try reading it as text. The function and method names speak for themselves in what they do. Maybe install xdebug and step through the code so you can see how it all changes. Basically every new Task is a small bundle of instructions to be carried out, triggering the next things in the chain until the chain is empty. Reading material: Classes, php anonymous functions
1

Here's an easier to follow example, From : http://arguments.callee.info/2010/02/21/multiple-curl-requests-with-php/

curl_multi_init. This family of functions allows you to combine cURL handles and execute them simultaneously.

EXAMPLE

build the individual requests, but do not execute them

$ch_1 = curl_init('http://webservice.one.com/');
$ch_2 = curl_init('http://webservice.two.com/');
curl_setopt($ch_1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch_2, CURLOPT_RETURNTRANSFER, true);

build the multi-curl handle, adding both $ch

$mh = curl_multi_init();
curl_multi_add_handle($mh, $ch_1);
curl_multi_add_handle($mh, $ch_2);

execute all queries simultaneously, and continue when all are complete

  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while ($running);

close the handles

curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);

all of our requests are done, we can now access the results

$response_1 = curl_multi_getcontent($ch_1);
$response_2 = curl_multi_getcontent($ch_2);
echo "$response_1 $response_2"; // output results

If both websites take one second to return, we literally cut our page load time in half by using the second example instead of the first!

Referances : https://www.php.net/manual/en/function.curl-multi-init.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.