How to get the json data in this site in python 3?

Question

My work is basically:

-Entering in this website "https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/preenchimento_municipio_cras_new1.php"

-Fill the 2 forms (with AC - Acre and Bujari, for example)

-Click on "Dados Detalhados"(detailed data) in the last column of the table generated. (When you click on "Dados Detalhados", it will generate a second table with the data of 1 month per row).

-Access the data generated by the second table clicking in "Visualizar Relatório" in the last column of each row. <---- THATS the data I'm trying to scrape. But it is a dynamic website and I can't get the data just accessing the url2 (when you click in 'Visualizar relatório' the website returns to the initial url but with the tables I want to scrape). Here is the code:

import requests
from bs4 import BeautifulSoup  
import pandas as pd

url = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/preenchimento_municipio_cras_new1.php'
params ={
    'uf_ibge': '12',
    'nome_estado': 'AC - Acre'
    'p_ibge': '1200138'
    'nome_municipio': 'Bujari'    
}


r = requests.post(url, params = params, verify = False)
soup = BeautifulSoup(r.text, "lxml")
tables = pd.read_html(r.text)
unidades = tables[1]
print(unidades)


url2 = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/rel_preenchidos_cras.php?&p_id_cras=12001301971'
params2 ={
    'p_id_cras': '12001301971'
    'mes_referencia': '2019-02-01'
}
r2 = requests.post(url2, json = params2, verify = False)
soup2 = BeautifulSoup(r2.text, 'lxml')

soup2

Note that url2 is the url generated when you click in "Dados Detalhados" and it has the 'p_id_cras' as the second dictionary.

params2 should be the dict used to scrape that data I'm talking about. I've tried the commands params, data and json in the second post request, but none of them works.

furas · Accepted Answer · 2019-04-05 18:36:19Z

url2 should use GET without parameters.
And then you have page with table with links which have href="javascript:"
but also onclick='enviadados(12001301971,"2019-02-01")'
so you have your parameters for next request.

Last request uses POST with parameters 12001301971,2019-02-01 and url

https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/visualiza_preenchimento_cras.php'`

My code. I hope it works correclty.

import requests
from bs4 import BeautifulSoup  
import pandas as pd

base = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/'

url = base + 'preenchimento_municipio_cras_new1.php'
#print('url:', url)
params ={
    'uf_ibge': '12',
    'nome_estado': 'AC - Acre',
    'p_ibge': '1200138',
    'nome_municipio': 'Bujari'    ,
}


r = requests.post(url, params=params, verify=False)
soup1 = BeautifulSoup(r.text, "lxml")
tables = pd.read_html(r.text)

#unidades = tables[1]
#print(unidades)

all_td1 = soup1.find('table', class_="panel-body").find_all('td')
#print('len(all_td1):', len(all_td1))
for td1 in all_td1:

    all_a1 = td1.find_all('a')[:1]
    #print('len(all_a1):', len(all_a1))
    for a1 in all_a1:

        url = base + a1['href']
        print('url:', url)

        r = requests.get(url, verify=False)
        soup2 = BeautifulSoup(r.text, "lxml")
        #print(soup.text)

        all_td2 = soup2.find('table', class_="panel-body").find_all('td')
        #print('len(all_td2):', len(all_td2))
        for td2 in all_td2:
            all_a2 = td2.find_all('a')
            #print('len(all_a2):', len(all_a2))
            for a2 in all_a2:
                print('onclick:', a2['onclick'])

                params = {
                    'p_id_cras': a2['onclick'][11:22], #'12001301971',
                    'mes_referencia': a2['onclick'][24:-2], #'2019-02-01',
                }

                print(params)

                url = 'https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/visualiza_preenchimento_cras.php'
                r = requests.post(url, params=params, verify=False)
                soup = BeautifulSoup(r.text, "lxml")
                all_table = soup.find_all('table')
                print(all_table)

Collectives™ on Stack Overflow

How to get the json data in this site in python 3?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related