1

I am using the PHP lib Simple HTML Dom Parser, as suggested here ( How do you parse and process HTML/XML in PHP? ) to parse a webpage's html content.

To create the DOM, I have to do:

$html = file_get_html('http://www.example.com/');

The problem is that if I do:

$html = file_get_html('www.example.com');

without specifying the URL's protocol, I will get an error.

My question is: How can I get to know if the URL with the protocol is "http://www.example.com/" or "https://www.example.com/" having in hands only the string "www.example.com"?

1
  • Well, you can't. Domain names are quite independent from the protocol used - might as well be ftp://, or something even more exotic. (as for the error: it's trying to open a local file named www.example.com - you probably don't have that on your disk :)) Commented Aug 26, 2011 at 1:08

3 Answers 3

2

I can't figure out something smarter than assuming "http://" as default and, if it fails, try "https://"

if (!$html = file_get_html('http://' . $url)) $html = file_get_html('https://' . $url);
Sign up to request clarification or add additional context in comments.

Comments

2

There is no way to know because both could be valid. I would assume http:// though because normal practice is to redirect http to https if it is required, and file_get_html should follow an HTTP 301 or 302 redirect.

Comments

1

You could try to use get_headers() on the http address and look for the Upgrade: request in the header. If you get a valid response, use http. Otherwise, try on https.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.