0

I have found and read quite some articles about scraping but am somehow as a beginner overwhelmed. I want to get data from a table (https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750)

I tried around with beautifulsoup and can get a list of the available option_tags (see options in soup object).

I am now troubling with getting the actual content / how to access for each date / option the table and save into e.g. a pandas df.

Any advices where to begin?

Here my code to get the options:

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750")

html = resp.content 
soup = BeautifulSoup(html)

option_tags = soup.find_all("option")

1 Answer 1

1

When I look your given url , I think the table is embeded the website which is given :

 <iframe src="_dat_esta_tipo02.php?estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M" name="contenedor" width="600" marginwidth="0" height="560" marginheight="0" scrolling="NO" align="center"  frameborder="0" id="interior"></iframe>

When you click src https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750 page is opens and shows the same table so you can soap this page . I try it for you Its given the true result

**All Code : **

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa- 
estaciones/_dat_esta_tipo02.php? 
estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M")

html = resp.content
soup = BeautifulSoup(html,"lxml") ## Add lxml  or html.parser in this line

option_tags = soup.find_all("tr" , attrs={'aling' : 'center'})

for a in option_tags:
    print a.find('div').text

OUTPUT :

Día/mes/año
Prom
01-02-2019
02-02-2019
03-02-2019
04-02-2019
05-02-2019
06-02-2019
07-02-2019
08-02-2019
09-02-2019
10-02-2019
11-02-2019
12-02-2019
13-02-2019
14-02-2019
15-02-2019
16-02-2019
17-02-2019
18-02-2019

Above code just get the date only. If you want to access all elements with given date you can create an array and append it . Just will change below code

array = []
for a in option_tags:
    array.append(a.text.split())

print array
Sign up to request clarification or add additional context in comments.

2 Comments

Great! That works:) One more question as this is my basic trouble: How did you find the url for a specific station and date? As the browser only shows the station name...
When I click view-source there is an iframe in it . iframe : An iframe is used to display a web page within a web page. So I thought the table is another viewpage but shown in your url @SamuelMüller

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.