-1

I've opened a .php page from a website with bunch of hyperlinks on it. I want to copy them (their URLs) into a .txt file. Of course, I could do that manually, but there are too many of them, so I would want to do it somehow automatically.

Before I would do it this way: I would look into the page source, that is, its HTML code, and then parse it with some small script written specially for that. But this one is a .php page and all the links are piped in from a database on the server, I guess, rather than from the source code. Anyway, they are not in the page's HTML code.

I wonder if that is still possible. I believe it should be possible - all the links are displayed on my screen, they are all click-able and working, there should some way of capturing them somehow.

14
  • 1
    You can use the same script to parse the links. Did you try that? Commented Jan 3, 2014 at 12:13
  • If they dont show in the source, then they are added by javascript, not php Commented Jan 3, 2014 at 12:13
  • 1
    Have you tried preg_match_all ? Commented Jan 3, 2014 at 12:14
  • 1
    Maybe you'll find what you're looking for here stackoverflow.com/questions/34120/html-scraping-in-php? Commented Jan 3, 2014 at 12:15
  • using file_get_contents() you can also do it by the same script Commented Jan 3, 2014 at 12:15

2 Answers 2

3

What I understand is you want to do this from browser itself: in that case use chrome open debug panel (press F12) and got to console tab and paste following code and press enter, and then copy the list of links from console and put in txt file.

var tags = document.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
    console.log(tags[i].getAttribute("href"));
}
Sign up to request clarification or add additional context in comments.

9 Comments

Make sure your console is filtered to all, and not debug. See Image - Chrome
WOW!!! It worked just like that! Thank you. Can you, please, tell me what language is your code written in?
it's simple Javascript :)
Ah! I see. I didn't know that Chrome accepts Javascript. Thanks again!
@HarryDenley - Thank you! Do you know any resourse on the internet where I could learn how to use that console with Javascript?
|
0

What you need to do.

Use php's CURL library to get the page as a string. Or better yet use file_get_contents

https://www.php.net/file_get_contents

$homepage = file_get_contents('http://www.example.com/');

Use the DomDocument library to build a html document. https://www.php.net/domdocument

$doc = new DOMDocument();
$doc->loadHTML($homepage);

From here you can get all the <a> tags in the html and get the href elements. By Calling $elements = $doc->getElementsByTagName("a");

Then just iterate over the elements getting the href out.

foreach($elements as $el) {
    $link = $el->getAttribute("href");
    echo $link . "\n";
}
//untested code

You can then re-use the script on any page, just change the curl request.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.