How to scrape hidden nested table tag with BeautifulSoup and Python?

Question

I've been trying to find the table from this website: https://consultas.anvisa.gov.br/#/medicamentos/25351532892201972/

I'm using this method below:

from bs4 import BeautifulSoup
import requests

url= "https://consultas.anvisa.gov.br/#/medicamentos/25351532892201972/"
page = requests.get(url, verify=False)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

for table in soup.find_all('table'):
    print(table)
    for subtable in table.find_all('table'):
        print(subtable)

It results in nothing, because somehow the table seems to be hidden. I can see the tags and the table using the firefox inspector (image below), but BeautifulSoup can't find it with the methods I've been trying so far.

What could I do to find these kinds of hidden nested tables? I already tried many ways to find it with soup.find(), soup.find_all(), soup.body.div.table.find_all(), but no success yet.

Thank you guys in advance! =)

Always look in your soup first - therein lies the truth. The content can always be slightly to extremely different from the view in the development tools. Content is provided dynamically, so you should try it with selenium. — HedgeHog
– HedgeHog, Commented Dec 31, 2021 at 17:02
https://consultas.anvisa.gov.br/api/consulta/medicamento/produtos/25351532892201972 — QHarr
– QHarr, Commented Dec 31, 2021 at 18:40

Jack Fleeting · Accepted Answer · 2021-12-31 19:33:36Z

1

The data you're looking for is loaded through an API call (which can be obtained with the development tools); the call returns a json, so no need for beautifulsoup:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://consultas.anvisa.gov.br/',
    'If-Modified-Since': 'Mon, 26 Jul 1997 05:00:00 GMT',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Authorization': 'Guest',
    'Connection': 'keep-alive',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
}

page = requests.get('https://consultas.anvisa.gov.br/api/consulta/medicamento/produtos/25351532892201972', headers=headers, verify=False)

data = json.loads(page.text)
data

And that's what the information in the table comes from.

answered Dec 31, 2021 at 19:33

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

HedgeHog Over a year ago

Also noticed this call but was not sure how to verify, if the number at the end (produtos/25351532892201972) is always the same or how to get it automatically from the html. Do you figured it out?

eduardosteps Over a year ago

I intend to scrape using many pages. This number is about one product. Thanks for your response.

Jack Fleeting Over a year ago

@eduardosteps You're welcome!

Collectives™ on Stack Overflow

How to scrape hidden nested table tag with BeautifulSoup and Python?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related