Replace Selenium with Requests in dynamic form

Question

I want to webscrape [this][1] page dynamic form, I'm using Selenium right now for that and obtaining some results.

My questions:

It is possible to replace the Selenium + WebDriver code with some POST Request? (I have worked with Requests before, but only when an API is available... I can't figure out how to reverse code this form)
Is there a better way to clean up the result page to get only the table? (In my example, the result "data" variable is a mess, but anyway I have obtained the last value which was the main purpose of the script)
Any recommendations?

My code:

from selenium import webdriver
import pandas as pd

from bs4 import BeautifulSoup

def get_tables(htmldoc):
    soup = BeautifulSoup(htmldoc)
    return soup.findAll('table')

driver = webdriver.Chrome()
driver.get("http://dgasatel.mop.cl/visita_new.asp")
estacion1 = driver.find_element_by_name("estacion1")
estacion1.send_keys("08370007-6")
driver.find_element_by_xpath("//input[@name='chk_estacion1a' and @value='08370007-6_29']").click()
driver.find_element_by_xpath("//input[@name='period' and @value='1d']").click()
driver.find_element_by_xpath("//input[@name='tiporep' and @value='I']").click()
driver.find_element_by_name("button22").click()

data = pd.read_html(driver.page_source)

print(data[4].tail(1).iloc[0][2])

Thanks in advance. [1]: http://dgasatel.mop.cl/visita_new.asp

never heard of Postman, but I'll need to check that out. the html-requests package might be a possibility. It has the option to let the page render before pulling the source html — chitown88
– chitown88, Commented Dec 11, 2018 at 19:25

B.Adler · Accepted Answer · 2018-12-12 01:12:38Z

The short answer to your question is yes, you can use the requests library to make the post requests. For an example you can easily open the inspector on your browser and copy a request using the following site:

https://curl.trillworks.com/

Then you can feed the response.text into BeautifulSoup to parse out the tables you want.

When I do this with the site in your example I get the following:

import requests

cookies = {
    'ASPSESSIONIDCQTTBCRB': 'BFDPGLCCEJMKPFKGJJFHKHFC',
}

headers = {
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
    'Origin': 'http://dgasatel.mop.cl',
    'Upgrade-Insecure-Requests': '1',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Referer': 'http://dgasatel.mop.cl/filtro_paramxestac_new.asp',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.9',
}

data = {
  'estacion1': '-1',
  'estacion2': '-1',
  'estacion3': '-1',
  'accion': 'refresca',
  'tipo': 'ANO',
  'fecha_fin': '11/12/2018',
  'hora_fin': '0',
  'period': '1d',
  'fecha_ini': '11/12/2018',
  'fecha_finP': '11/12/2018',
  'UserID': 'nobody',
  'EsDL1': '0',
  'EsDL2': '0',
  'EsDL3': '0'
}

response = 
requests.post(
    'http://dgasatel.mop.cl/filtro_paramxestac_new.asp',
    headers=headers, cookies=cookies, data=data)

For cleaning up the data, I recommend you map the data points you are wanting onto a dictionary or into a csv with loops.

for table in data:
    if table.tail(1) and table.tail(1).iloc:
        print(table.tail(1).iloc[0][2])

Perfect!, Thank you! The only change that I had to do is to point the POST request to the results page (dgasatel.mop.cl/cons_det_instan.asp in my case) and it worked.

Collectives™ on Stack Overflow

Replace Selenium with Requests in dynamic form

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related