1

I need to gather PDF-files from this page: http://www.anp.gov.br/?id=532.

I wonder how this is possible in Python when I cant find the links in the HTML source code. Before I have found the links to such files by using Beautifulsoup and pandas.

Thanks for all kind of answers!

2
  • Can you explain why you can't find the links in the HTML source code? I'm not sure I'm clear on the goal here. Commented Jul 7, 2015 at 17:15
  • Hi, Alex W! The developers that made the page have not written the links directly in the HTML source code, but are called when clicked. I want these links to collect all the data, and merge them into one excel sheet. Thanks for the respond btw! Commented Jul 7, 2015 at 17:18

1 Answer 1

4

It looks like all of the pdf links are in <a> tags so you can use BeautifulSoup to grab those links. If you need further advice I recommend you reference this discussion to see how to accomplish that task.

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

The problem is just that the links is not in <a> tags.
Check the image I uploaded. I can see the links to the files, hopefully you can as well! If so, you can reference the discussion I linked to in order to get the url from the href in the <a> tag.
Thanks a lot! Found it now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.