0

I am scraping data from an URL using cURL

for ($i = 0; $i < 1000000; $i++) {

    $curl_handle = curl_init();
    curl_setopt($curl_handle, CURLOPT_URL, 'http://example.com?page='.$i);
    curl_exec($curl_handle);
    curl_close($curl_handle);

    // some code to save the HTML page on HDD
}

I wanted to know if there is some way that I could speed up the process? Maybe multithreading? How could I do it?

2

2 Answers 2

2

cURL Multi does not make parallel requests, it makes asynchronous requests.

The documentation was wrong until 5 minutes ago, it will take some time for the corrected documentation to be deployed and translated.

Asynchronous I/O (using something like the cURL Multi API) is the simplest thing to do, however, it can only make requests asynchronously; the processing of data once downloaded, for example writing to disk would still cause lots of blocking I/O, similarly further processing of the data (parsing json for example) would occur synchronously, in a single thread of execution.

Multi-threading is the other option, this requires that you have a thread safe build of PHP and the pthreads extension installed.

Multi-threading has the advantage that all processing can be done for each download and subsequent actions in parallel, fully utilizing all the CPU cores available.

What is best depends largely on how much processing of downloaded data your code must perform, and even then can be considered a matter of opinion.

Sign up to request clarification or add additional context in comments.

Comments

1

You're looking for the curl_multi_* set of functions: "Allows the processing of multiple cURL handles in parallel".

Take a look at the complete example on the curl_multi_init() page.

Check out these articles for more information about how curl_multi_exec() works:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.