0

I'm using an Indeed API from Rapid API to collect job data. The code snippet provided only returns results for 1 page. I was wondering how to set up a for loop to iterate through multiple pages and append the results together.

url = "https://indeed11.p.rapidapi.com/"


payload = {
    "search_terms": "data visualization",
    "location": "New York City, NY",
    "page": 1,
    "fetch_full_text": "yes"
}

headers = {
    "content-type": "application/json",
    "X-RapidAPI-Key": "{api key here}", # insert here,
    "X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}

response = requests.request("POST", url, json=payload, headers=headers)

As seen in the code above, the key "page" is set to a value of 1. How would I parameterize this value, and how would I construct the for loop while appending the results from each page?

1
  • "How would I parameterize this value" ... in "page": 1 you would replace the 1 with a variable name, so you could trivially get that from incrementing a number in the loop. Then the other part you will need is to detect (by looking at the response content) when there is no longer a 'next' page of results, so you can break out of the loop. Other than that it's just a case of appending the results to a list each time through the loop. Commented Jul 9, 2022 at 16:45

3 Answers 3

2

You can make the pagination with the help of payload along with for loop and range function

import requests

url = "https://indeed11.p.rapidapi.com/"

payload = {
    "search_terms": "data visualization",
    "location": "New York City, NY",
    "page": 1,
    "fetch_full_text": "yes"
}

headers = {
    "content-type": "application/json",
    "X-RapidAPI-Key": "{api key here}", # insert here,
    "X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}
for page in range(1,11):
    payload['page'] = page

    response = requests.post(url, json=payload, headers=headers)
Sign up to request clarification or add additional context in comments.

1 Comment

for payload['page'] in range(1,11): skips creation of page variable :) +1
1

You can try this:

max_page = 100
result = {}
for i in range(1, max_page + 1):
    try:
        payload.update({'page': i})
        
        if i not in result:
            result[i] = requests.request("POST", url, json=payload, headers=headers)
            
    except:
        continue

Comments

1

I think that you could do this with a while loop. To implement this, you would need code to detect when there are no more pages to read, but it's probably possible. Here's what I would do:

url = "https://indeed11.p.rapidapi.com/"

payload = {
    "search_terms": "data visualization",
    "location": "New York City, NY",
    "page": 1,
    "fetch_full_text": "yes"
}

headers = {
    "content-type": "application/json",
    "X-RapidAPI-Key": "{api key here}", # insert here,
    "X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}

responses = []
while not no_more_pages(): # no_more_pages() is a placeholder for code that detects when there are no more pages to read
    responses.append(requests.request("POST", url, json=payload, headers=headers))
    payload['page'] += 1

Once the loop is done, you could use the responses list to access the data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.