How to scroll down and click button for continuous web scraping the page in python

Question

I want to scrape the full page to get links of account but problem is:

I need to click Load more button many time to get full list of accounts to scrape
There is a popup which comes occasionally so how do I detect it and click cancel button

If possible then I prefer to scrape the full page with request only. Since I have to click buttons so thought of using selenium.

Here is my code:

import time
import requests
from bs4 import BeautifulSoup
import lxml
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://society6.com/franciscomffonseca/followers')

time.sleep(3)

try: driver.find_element_by_class_name('bx-button').click() #button to remove popup

except: print("no popups")

driver.find_element_by_class_name('loadMore').click #to click load more button

I am using a test page which has 10K followers and want to scrape their followers account link. I have already code the scraper so just need to see full webpage

https://society6.com/franciscomffonseca/followers

Scraping code just in case:

r2 = requests.get('https://society6.com/franciscomffonseca/followers')
print(r2.status_code)
r2.raise_for_status

soup2 = BeautifulSoup(r2.content, "html.parser")
a2_tags = soup2.find_all(attrs={"class": "user"})

#attrs={"class": "user-list clearfix"}

follow_accounts = []

for a2 in a2_tags:
    follow_accounts.append('https://society6.com'+a2['href'])

print(follow_accounts)
print("number of accounts scraped: " + str(len(follow_accounts)))

Html of load more button:

<button class="loadMore" onclick="loadMoreFollowers();">Load More</button>

Did you try to scrape required data with requests to society6.com/api/users/franciscomffonseca/… ? Just use for loop and increase page number in page=1 by 1 on each iteration — Andersson
– Andersson, Commented Sep 17, 2018 at 8:44

Andersson · Accepted Answer · 2018-09-17 10:10:56Z

4

You can make direct request to Society6 API as below:

counter = 1

while True:
    source = requests.get('https://society6.com/api/users/franciscomffonseca/followers?page=%s' % counter).json()
    if source['data']['attributes']['followers']:
        for i in source['data']['attributes']['followers']:
            print(i['card']['link']['href'])
        counter += 1
    else:
        break

This will print relative hrefs as

/wickedhonna
/wiildrose
/williamconnolly
/whiteca1x

If you want absolute hrefs just replace

print(i['card']['link']['href'])

with

print("https://society6.com" + i['card']['link']['href'])

edited Sep 17, 2018 at 10:10

answered Sep 17, 2018 at 9:00

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sultan Morbiwala Over a year ago

awesome! How did you find the URL and where can i learn more about connecting with website APi's to get more information like this?

Andersson Over a year ago

You can open DevTools (F12) -> Switch to Network tab -> Enable "XHR" sub-tab only -> Tap on "Load More" button on page and check URL/headers/response/etc of request that sends

Sultan Morbiwala Over a year ago

so if i just replace the url account in the api link then will i get the accounts of that particular account or this api link is unique to only that account?

Andersson Over a year ago

I'm not sure I understand your question... You can do for i in source['data']['attributes']['followers']: print(i['id']) to get IDs of followers, then you can pass received ID to URL https://society6.com/api/users/franciscomffonseca/followers?page=%s in place of franciscomffonseca and you can scrape followers of other users

Collectives™ on Stack Overflow

How to scroll down and click button for continuous web scraping the page in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related