1

I am trying to get data using Python from a public site. On that site the there are different type of searches. There is a search type that is by letter. when i search it with the letter 'A' it sends a GET requests to page that returns a response from below url.

http://www.museumsusa.org/museums/?k=1271393%2cAlpha%3aA%3bDirectoryID%3a200454

but it display the first page. I get all the data on the first page. But when i click on the second page. It sends a get request that is by _postback function o JavaScript to the same url that is used for the GET request but with different parameters.

data={
'__EVENTTARGET':"ctl08$ctl00$BottomPager$Page2",
'__EVENTARGUMENT':"",
'__VIEWSTATE':VIEWSTATE,
'__EVENTVALIDATION':EVENTVALIDATION,
'ctl04$phrase':"",
'ctl04$directoryList':"/museums/|/museums/search/"

In __EVENTTARGET it sends a page name. I have successfully got the VIEWSTATE value and EVENTVALIDATION. But whenever is send a post request i always get the first page. This is my complete code.

import requests
import json
from bs4 import BeautifulSoup
import urllib



url="http://www.museumsusa.org/museums/?k=1271393%2cAlpha%3aA%3bDirectoryID%3a200454";
headers={
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) "
                 "Chrome/60.0.3112.101 Safari/537.36",
    "Content-Type":"application/x-www-form-urlencoded"}

session = requests.Session()
session.headers.update(headers)
r=session.get(url)
soup=BeautifulSoup(r.content)
#?k=1271393%2cAlpha%3aA%3bDirectoryID%3a200454
VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
#VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']


data_in={
'__EVENTTARGET':"ctl08$ctl00$BottomPager$Page2",
'__EVENTARGUMENT':"",
'__VIEWSTATE':VIEWSTATE,
'__EVENTVALIDATION':EVENTVALIDATION,
'ctl04$phrase':"",
'ctl04$directoryList':"/museums/|/museums/search/"
#"k":"1271393,Alpha:A;DirectoryID:200454"
      }


r2 = session.post(url, data=json.dumps(data_in))

print (r2)

How can i get the data form different pages because this script always returns me data of the first page. No matter what number if try. I am using Python 3.6 on Mac OSX

1 Answer 1

1

You can go to the next page if you change the value of data_in['__EVENTTARGET'] to "ctl08$ctl00$BottomPager$Next". Then use a for loop to get a specific number of pages, eg 10

url = "http://www.museumsusa.org/museums/?k=1271393%2cAlpha%3aA%3bDirectoryID%3a200454"
headers={
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko)"
}
session = requests.Session()
session.headers.update(headers)
r=session.get(url)
pages = 10

for _ in range(pages):
    soup=BeautifulSoup(r.content, 'html.parser')
    VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
    EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
    data_in={
        '__EVENTTARGET':'ctl08$ctl00$BottomPager$Next',
        '__EVENTARGUMENT':"",
        '__VIEWSTATE':VIEWSTATE,
        '__EVENTVALIDATION':EVENTVALIDATION,
        'ctl04$phrase':"",
        'ctl04$directoryList':"/museums/|/museums/search/"
    }
    r = session.post(url, data=data_in)
Sign up to request clarification or add additional context in comments.

4 Comments

let me try that
Is there a command for previous ?ctl08$ctl00$BottomPager$Previous something like that.
what if i have to go two pages ahead
I'm afraid the site won't allow you to select an arbitrary page ( eg: "ctl08$ctl00$BottomPager$Page4" ). If you want to skip a page you can just ignore the result; you can track the current page with the value of _ + 1. However you can go back a page with "ctl08$ctl00$BottomPager$Prev".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.