1

I want to scrape the full page to get links of account but problem is:

  1. I need to click Load more button many time to get full list of accounts to scrape

  2. There is a popup which comes occasionally so how do I detect it and click cancel button

If possible then I prefer to scrape the full page with request only. Since I have to click buttons so thought of using selenium.

Here is my code:

import time
import requests
from bs4 import BeautifulSoup
import lxml
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://society6.com/franciscomffonseca/followers')

time.sleep(3)

try: driver.find_element_by_class_name('bx-button').click() #button to remove popup

except: print("no popups")

driver.find_element_by_class_name('loadMore').click #to click load more button

I am using a test page which has 10K followers and want to scrape their followers account link. I have already code the scraper so just need to see full webpage

https://society6.com/franciscomffonseca/followers

Scraping code just in case:

r2 = requests.get('https://society6.com/franciscomffonseca/followers')
print(r2.status_code)
r2.raise_for_status

soup2 = BeautifulSoup(r2.content, "html.parser")
a2_tags = soup2.find_all(attrs={"class": "user"})

#attrs={"class": "user-list clearfix"}

follow_accounts = []

for a2 in a2_tags:
    follow_accounts.append('https://society6.com'+a2['href'])

print(follow_accounts)
print("number of accounts scraped: " + str(len(follow_accounts)))

Html of load more button:

<button class="loadMore" onclick="loadMoreFollowers();">Load More</button>

2
  • 1
    Did you try to scrape required data with requests to society6.com/api/users/franciscomffonseca/… ? Just use for loop and increase page number in page=1 by 1 on each iteration Commented Sep 17, 2018 at 8:44
  • No, i don't know much about API. Commented Sep 17, 2018 at 9:39

1 Answer 1

4

You can make direct request to Society6 API as below:

counter = 1

while True:
    source = requests.get('https://society6.com/api/users/franciscomffonseca/followers?page=%s' % counter).json()
    if source['data']['attributes']['followers']:
        for i in source['data']['attributes']['followers']:
            print(i['card']['link']['href'])
        counter += 1
    else:
        break

This will print relative hrefs as

/wickedhonna
/wiildrose
/williamconnolly
/whiteca1x

If you want absolute hrefs just replace

print(i['card']['link']['href'])

with

print("https://society6.com" + i['card']['link']['href'])
Sign up to request clarification or add additional context in comments.

4 Comments

awesome! How did you find the URL and where can i learn more about connecting with website APi's to get more information like this?
You can open DevTools (F12) -> Switch to Network tab -> Enable "XHR" sub-tab only -> Tap on "Load More" button on page and check URL/headers/response/etc of request that sends
so if i just replace the url account in the api link then will i get the accounts of that particular account or this api link is unique to only that account?
I'm not sure I understand your question... You can do for i in source['data']['attributes']['followers']: print(i['id']) to get IDs of followers, then you can pass received ID to URL https://society6.com/api/users/franciscomffonseca/followers?page=%s in place of franciscomffonseca and you can scrape followers of other users

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.