2

I'm trying to scrape data from some websites. For several sites it all seems to go fine, but for one website it doesn't seem to be able to get any HTML. This is my code:

<?php include_once('simple_html_dom.php');

$html = file_get_html('https://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=' . $_POST['data']);

echo $html; ?>

I'm using ajax to fetch the data. When I log the returned value in my js it's completely empty.

Could it be due to the fact that this website is running on https? And if so, is there any way to work around it? (I've tried changed the url to http, but I get the same result)

Update:

If I var_dump the $html variable, I get bool(false).

My PHP error log says this:

[27-Feb-2014 22:20:50 Europe/Amsterdam] PHP Warning: file_get_contents(http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /Users/leondewit/PhpstormProjects/Magic/stores/simple_html_dom.php on line 75

3
  • Just try standard debugging. Maybe there was an error. Try turning error_reporting on or check your error logs. Try echo'ing something else out instead of $html to see if you get any result. Also, maybe try to var_dump($html); instead of just echo it. Commented Feb 27, 2014 at 21:28
  • Updated my question with feedback. Also, if I echo something else (ie a string) I get a normal result. Commented Feb 27, 2014 at 21:33
  • 1
    The 403 Forbidden error code is sent from the server you are trying to contact (magiccardmarket) and is usually sent when the page you are requesting requires a login. It is possible they are blocking automated requests from user agents that are not browsers. You could try to change your user agent, but that is really a guess. If that is the case though, they are blocking it for a reason which is most likely that they just don't want people to abuse their website. Commented Feb 27, 2014 at 21:41

1 Answer 1

5

It's your user agent, file_get_contents doesn't send one by default, so:

$url = 'http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf';
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents($url, false, $context);
$html = str_get_html($response);
echo $html;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.