1

I've been trying to find the table from this website: https://consultas.anvisa.gov.br/#/medicamentos/25351532892201972/

I'm using this method below:

from bs4 import BeautifulSoup
import requests

url= "https://consultas.anvisa.gov.br/#/medicamentos/25351532892201972/"
page = requests.get(url, verify=False)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

for table in soup.find_all('table'):
    print(table)
    for subtable in table.find_all('table'):
        print(subtable)

It results in nothing, because somehow the table seems to be hidden. I can see the tags and the table using the firefox inspector (image below), but BeautifulSoup can't find it with the methods I've been trying so far.

enter image description here

What could I do to find these kinds of hidden nested tables? I already tried many ways to find it with soup.find(), soup.find_all(), soup.body.div.table.find_all(), but no success yet.

Thank you guys in advance! =)

2
  • 2
    Always look in your soup first - therein lies the truth. The content can always be slightly to extremely different from the view in the development tools. Content is provided dynamically, so you should try it with selenium. Commented Dec 31, 2021 at 17:02
  • https://consultas.anvisa.gov.br/api/consulta/medicamento/produtos/25351532892201972 Commented Dec 31, 2021 at 18:40

1 Answer 1

1

The data you're looking for is loaded through an API call (which can be obtained with the development tools); the call returns a json, so no need for beautifulsoup:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://consultas.anvisa.gov.br/',
    'If-Modified-Since': 'Mon, 26 Jul 1997 05:00:00 GMT',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Authorization': 'Guest',
    'Connection': 'keep-alive',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
}

page = requests.get('https://consultas.anvisa.gov.br/api/consulta/medicamento/produtos/25351532892201972', headers=headers, verify=False)

data = json.loads(page.text)
data

And that's what the information in the table comes from.

Sign up to request clarification or add additional context in comments.

3 Comments

Also noticed this call but was not sure how to verify, if the number at the end (produtos/25351532892201972) is always the same or how to get it automatically from the html. Do you figured it out?
I intend to scrape using many pages. This number is about one product. Thanks for your response.
@eduardosteps You're welcome!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.