1

I need to make over 1k rqs simultaneously and get a responses in time less than 1 min. I'm using PHP and cURL multi. For some reason cURL doesn't work as expected and cannot handle such an amount of requests.

I'm using https://github.com/petewarden/ParallelCurl

$parallel_curl = new ParallelCurl(1000, [
    CURLOPT_SSL_VERIFYPEER => FALSE,
    CURLOPT_TIMEOUT => 10,
    CURLOPT_SSL_VERIFYHOST => FALSE,
    CURLOPT_HTTPHEADER => [
        'Accept-Encoding: gzip',
        'Accept: */*'
    ]
]);

$resp = function($content, $url, $ch, $search) {
    $info = curl_getinfo($ch);
    file_put_contents("result.csv", $info['url'] . ";" . $info['total_time'] . ";" . $info['http_code'] . "\n", FILE_APPEND);
};

$urls = explode("\n", file_get_contents("urls.csv"));
foreach(array_slice($urls, 0, 1000) as $url) {
    $parallel_curl->startRequest("http://" . $url, $resp);
}


$parallel_curl->finishAllRequests();

I set timeout to 10s.

When I open result.csv and sort by total_time descending, about half of entries is like

domain;total_time;http_code
http://domain1.com;0.000785;0
http://domain2.com;0.000783;0
http://domain3.com;0.00077;0
http://domain4.com;0.000761;0
http://domain5.com;0.00076;0

cURL gives a status code 0 and short response time, although domain exists and loads normally in the browser. When I edit urls.csv and set only one url (ie domain1.com) it works well and gives correct status 200...

Am I reaching some limit? is there anything I can do with it?

1
  • 2
    You probably reached the max number of open sockets and files your process is allowed to use. That's normally 1024. Commented Jun 26, 2016 at 21:13

1 Answer 1

1

Am I reaching some limit? is there anything I can do with it?

Well, you could check with netstat that you are not hitting the max. sockets.

Please consider that the library you are using is 4 years old and deprecated. So, i guess, it's not your fault, that the requests are not running concurrently. According to the issue tracker other developers had the same problems with this library, see https://github.com/petewarden/ParallelCurl/issues/20. RollingCurlX (https://github.com/marcushat/rollingcurlx) was created to address the issue.

I'd suggest to go with Guzzle (https://github.com/guzzle/guzzle). The GuzzleHttp\Pool example provided here http://docs.guzzlephp.org/en/latest/quickstart.html#concurrent-requests should get you started in no time...

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you Jens. I have already used guzzle and found it pretty awesome, but I was trying to find alternative - guzzle seemed to be too slow for my needs (described a few days ago here stackoverflow.com/questions/38003681/… ) now I know that probably I'm hitting max sockets, and there's not much I can do with it. I will consider additional server resources to handle such an amount of rqs in such a minimal time, that would maybe solve my problem.
3000 requests per minute? I tend to say that PHP isn't the right tool for this job. But it depends! On many things. For instance: On the type of request! Are you doing GET requests for gzipped content, which cause additional storage IO? Or just lightweight HEAD requests to check if a website is up? You need to optimize the request headers to ask exactly what you want, to keep negotiation time minimal. | PHP is not multi-threaded. You can have concurrent requests but in a single thread. You could try to run multiple PHP scripts at the same time; well, until it maxes out your bandwidth or CPU.
I'm asking using GET, but I will think about using HEAD. I was playing around with guzzle + pthreads with no satisfying results due to this bug github.com/guzzle/guzzle/issues/1398 . Now I'm thinking about going with Java, but I just wanted to make sure that the goal is possible to gain using some other language. I'll play with multiple PHP scripts, thanks again for your advices!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.