Extracting links from website using Python, NOT IN HTML

Question

I need to gather PDF-files from this page: http://www.anp.gov.br/?id=532.

I wonder how this is possible in Python when I cant find the links in the HTML source code. Before I have found the links to such files by using Beautifulsoup and pandas.

Thanks for all kind of answers!

Can you explain why you can't find the links in the HTML source code? I'm not sure I'm clear on the goal here. — Alex W
– Alex W, Commented Jul 7, 2015 at 17:15
Hi, Alex W! The developers that made the page have not written the links directly in the HTML source code, but are called when clicked. I want these links to collect all the data, and merge them into one excel sheet. Thanks for the respond btw! — Mathias Lia Carlsen
– Mathias Lia Carlsen, Commented Jul 7, 2015 at 17:18

Community · Accepted Answer · 2017-05-23 11:51:09Z

4

It looks like all of the pdf links are in <a> tags so you can use BeautifulSoup to grab those links. If you need further advice I recommend you reference this discussion to see how to accomplish that task.

enter image description here

edited May 23, 2017 at 11:51

CommunityBot

11 silver badge

answered Jul 7, 2015 at 17:20

gffbss

1,7011 gold badge17 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mathias Lia Carlsen Over a year ago

The problem is just that the links is not in <a> tags.

gffbss Over a year ago

Check the image I uploaded. I can see the links to the files, hopefully you can as well! If so, you can reference the discussion I linked to in order to get the url from the href in the <a> tag.

Mathias Lia Carlsen Over a year ago

Thanks a lot! Found it now!

Collectives™ on Stack Overflow

Extracting links from website using Python, NOT IN HTML

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related