3

I'm trying to get the content of a webpage that require authentication using PHP.
Ideally, I'd like to use the simple html dom parser: http://simplehtmldom.sourceforge.net.
Anyone knows of a way to do this?

Edit:
Tried the following code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=' . urlencode($username) . '&password=' . urlencode($pass));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
echo curl_exec($ch);
curl_close($ch);

But now I get a 405 HTTP error.

2
  • 1
    What kinda authentication? simple HTTP Authentication or actual sessionstored data? Commented May 10, 2011 at 15:10
  • Yeah, I'm not talking about a simple HTTP authentication, I'm talking about a regular login form. Commented May 10, 2011 at 16:59

1 Answer 1

3

I've never used that parser, but their sample code makes it seem like it can load data from either a file or a URL. I would use php's curl functions, which easily allow you to access a page with several types of authentication, save the results to a file and then use the library to parse the file.

http://www.php.net/manual/en/book.curl.php

Check out the CURLOPT_HTTPAUTH option specifically.

Hope this helps.

Edit:

I had to look up 405; I've never seen one. It sounds like your ISP doesn't allow POST requests, or possibly doesn't allow them without SSL:

http://www.checkupdown.com/status/E405.html

I would talk to whoever runs your server about the 405. Your code looks good to me. Does posting the login form return the page you want, or are you going to have to pull down another once you have the session info saved?

Sign up to request clarification or add additional context in comments.

4 Comments

Could you please further explain? Tried CURLOPT_HTTPAUTH but couldn't get it to work. By the way, to make it absolutely clear, I'm not talking about htaccess authentication, but a form authentication, like gmail or facebook or whatever.
Ah. Most people doing Facebook or Gmail authentications would be using Oauth. That is still possible; I found an example link here: (youtube, not facebook, but oauth is ouath) stackoverflow.com/questions/1522869/…. I recommend checking out the API of the site you're interested in and altering the particulars.
It looks like Google still allows basic auth, actually, checkout.google.com/support/sell/bin/…
Google and facebook were just examples. Can't use Oauth with the site I'm trying to pull info from.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.