Simple DOM file_get_html returns nothing

Question

I'm trying to scrape data from some websites. For several sites it all seems to go fine, but for one website it doesn't seem to be able to get any HTML. This is my code:

<?php include_once('simple_html_dom.php');

$html = file_get_html('https://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=' . $_POST['data']);

echo $html; ?>

I'm using ajax to fetch the data. When I log the returned value in my js it's completely empty.

Could it be due to the fact that this website is running on https? And if so, is there any way to work around it? (I've tried changed the url to http, but I get the same result)

Update:

If I var_dump the $html variable, I get bool(false).

My PHP error log says this:

[27-Feb-2014 22:20:50 Europe/Amsterdam] PHP Warning: file_get_contents(http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /Users/leondewit/PhpstormProjects/Magic/stores/simple_html_dom.php on line 75

Just try standard debugging. Maybe there was an error. Try turning error_reporting on or check your error logs. Try echo'ing something else out instead of $html to see if you get any result. Also, maybe try to var_dump($html); instead of just echo it. — Jonathan Kuhn
– Jonathan Kuhn, Commented Feb 27, 2014 at 21:28
Updated my question with feedback. Also, if I echo something else (ie a string) I get a normal result. — Leon
– Leon, Commented Feb 27, 2014 at 21:33
The 403 Forbidden error code is sent from the server you are trying to contact (magiccardmarket) and is usually sent when the page you are requesting requires a login. It is possible they are blocking automated requests from user agents that are not browsers. You could try to change your user agent, but that is really a guess. If that is the case though, they are blocking it for a reason which is most likely that they just don't want people to abuse their website. — Jonathan Kuhn
– Jonathan Kuhn, Commented Feb 27, 2014 at 21:41

pguardiario · Accepted Answer · 2014-02-28 00:41:21Z

5

It's your user agent, file_get_contents doesn't send one by default, so:

$url = 'http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf';
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents($url, false, $context);
$html = str_get_html($response);
echo $html;

answered Feb 28, 2014 at 0:41

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Simple DOM file_get_html returns nothing

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related