3

I have a php script for a URL status checker tool that will check given URLs and show the ones with 404 error.

StatusCheckerRequest has the input with "\n" seperated URLs

public function PostStatusChecker(StatusCheckerRequest $request){
    $urls = $request->source;
    $seperateURLs = explode("\n", $urls);
    // -- create all the individual cURL handles and set their options
    $curl_handles = array();
    foreach ($seperateURLs as $url) {
        $curl_handles[$url] = curl_init();
        curl_setopt($curl_handles[$url], CURLOPT_URL, $url);
        curl_setopt($curl_handles[$url], CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_handles[$url], CURLOPT_CONNECTTIMEOUT, 20);
        curl_setopt($curl_handles[$url], CURLOPT_SSL_VERIFYPEER, false);
    }
    // -- start going through the cURL handles and running them
    $curl_multi_handle = curl_multi_init();
    $i = 0; // count where we are in the list so we can break up the runs into smaller blocks
    $block = array(); // to accumulate the curl_handles for each group we'll run simultaneously
    $results = array();
    $curlErrors = array();
    foreach ($curl_handles as $a_curl_handle) {
        $i++; // increment the position-counter

        // add the handle to the curl_multi_handle and to our tracking "block"
        curl_multi_add_handle($curl_multi_handle, $a_curl_handle);
        $block[] = $a_curl_handle;

        // -- check to see if we've got a "full block" to run or if we're at the end of out list of handles
        if (($i % BLOCK_SIZE == 0) or ($i == count($curl_handles))) {
            // -- run the block
            $running = NULL;
            do {
                // track the previous loop's number of handles still running so we can tell if it changes
                $running_before = $running;

                // run the block or check on the running block and get the number of sites still running in $running
                curl_multi_exec($curl_multi_handle, $running);
                print_r (curl_multi_info_read($curl_multi_handle));
            } while ($running > 0);


            // -- once the number still running is 0, curl_multi_ is done, so check the results
            foreach ($block as $handle) {
                // HTTP response code
                $code = curl_getinfo($handle,  CURLINFO_HTTP_CODE);
                $results['httpCode'][] = $code;

                // cURL error number
                 $curl_errno = curl_errno($handle);
                $results['curlErrorNo'][] = $curl_errno;

                // cURL error message
                $curl_error = curl_error($handle);
                $results['curlErrorMessage'][] = $curl_error;        

                // remove the (used) handle from the curl_multi_handle
                curl_multi_remove_handle($curl_multi_handle, $handle);
            }

            // reset the block to empty, since we've run its curl_handles
            $block = array();
        }
    }
    // close the curl_multi_handle once we're done
    curl_multi_close($curl_multi_handle);
    print_r($results);
    die();
}

I used an example of curl_multi_exec from Stack Overflow,and when I check the results with these URLs:

Array
(
    [0] => stackoverfloww.com
    [1] => www.laravel2.com
    [2] => http://stackoverflow.com
    [3] => http://laravel.com
)

The output is

[httpCode] => Array
(
    [0] => 0
    [1] => 0
    [2] => 0
    [3] => 301
)

[curlErrorMessage] => Array
(
    [0] => Illegal characters found in URL
    [1] => Illegal characters found in URL
    [2] => Illegal characters found in URL
    [3] => 
)

I tried with different inputs and the results are always the last URL returns 200 or 301,and the others are all 0.I also check the results of curl_multi_info_read and the results are all 3 for the "illegal character found" urls,and the last one's value is 0.

Can you please help what is wrong with this? Thank you very much.

1 Answer 1

3

A quick search of the cURL source code reveals that this error comes from the fact that the URL being supplied to CURLOPT_URL contains the characters \r and/or \n.

From lib/url.c:

  /* We might pass the entire URL into the request so we need to make sure
   * there are no bad characters in there.*/
  if(strpbrk(data->change.url, "\r\n")) {
    failf(data, "Illegal characters found in URL");
    return CURLE_URL_MALFORMAT;
  }

You should run the URL through $url = trim($url); since there is probably a remaining \r or \n on the end of the URL.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.