PHP DOM Parsing URL did not return anything

Question

i'm using this example code to start with parsing aspecial website:

<?php

# Use the Curl extension to query Google and get back a page of results
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

# Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($html);

# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {
        # Show the <a href>
        echo $link->getAttribute('href');
        echo "<br />";
}
?>

Source

Then i changed the above url to removed for privacy reasons and run the script again, but no i got no output, but with the google-URL it will work. So what's the problem with my website? Are the protection methods to avoid the parsing or is the page not conform to the standard? Hope someone could help me.

Try outputting the HTML and see what it returns. Also take a look at the HTTP response headers. With that said, in all likelihood if the URL works in your browser and not in curl, it's probably because it rejects requests with no user agent set. I've seen this before a few times. — Mike
– Mike, Commented Dec 16, 2018 at 0:00
Is your curl extension enabled? I can retrieve links using your code — McBern
– McBern, Commented Dec 16, 2018 at 1:45

Peter · Accepted Answer · 2018-12-16 02:20:44Z

1

It looks like that site returns only gzip encoded responses. So you need to set the correct cURL encoding and send the correct encoding headers:

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    'Accept-Encoding: gzip, deflate, br',
));
$html = curl_exec($ch);
curl_close($ch);

This is working on my end.

answered Dec 16, 2018 at 2:20

Peter

1,7351 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PHP DOM Parsing URL did not return anything

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related