Scraping a Java Web-page

Question

I have found and read quite some articles about scraping but am somehow as a beginner overwhelmed. I want to get data from a table (https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750)

I tried around with beautifulsoup and can get a list of the available option_tags (see options in soup object).

I am now troubling with getting the actual content / how to access for each date / option the table and save into e.g. a pandas df.

Any advices where to begin?

Here my code to get the options:

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750")

html = resp.content 
soup = BeautifulSoup(html)

option_tags = soup.find_all("option")

Omer Tekbiyik · Accepted Answer · 2019-02-18 21:16:21Z

1

When I look your given url , I think the table is embeded the website which is given :

 <iframe src="_dat_esta_tipo02.php?estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M" name="contenedor" width="600" marginwidth="0" height="560" marginheight="0" scrolling="NO" align="center"  frameborder="0" id="interior"></iframe>

When you click src https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750 page is opens and shows the same table so you can soap this page . I try it for you Its given the true result

**All Code : **

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa- 
estaciones/_dat_esta_tipo02.php? 
estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M")

html = resp.content
soup = BeautifulSoup(html,"lxml") ## Add lxml  or html.parser in this line

option_tags = soup.find_all("tr" , attrs={'aling' : 'center'})

for a in option_tags:
    print a.find('div').text

OUTPUT :

Día/mes/año
Prom
01-02-2019
02-02-2019
03-02-2019
04-02-2019
05-02-2019
06-02-2019
07-02-2019
08-02-2019
09-02-2019
10-02-2019
11-02-2019
12-02-2019
13-02-2019
14-02-2019
15-02-2019
16-02-2019
17-02-2019
18-02-2019

Above code just get the date only. If you want to access all elements with given date you can create an array and append it . Just will change below code

array = []
for a in option_tags:
    array.append(a.text.split())

print array

answered Feb 18, 2019 at 21:16

Omer Tekbiyik

4,8041 gold badge19 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

shmulik90 Over a year ago

Great! That works:) One more question as this is my basic trouble: How did you find the url for a specific station and date? As the browser only shows the station name...

Omer Tekbiyik Over a year ago

When I click view-source there is an iframe in it . iframe : An iframe is used to display a web page within a web page. So I thought the table is another viewpage but shown in your url @SamuelMüller

Collectives™ on Stack Overflow

Scraping a Java Web-page

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related