Looping through csv of URLs using BeautifulSoup

Question

I am leveraging BeautifulSoup to scrape websites in Python.

Where URLs have had rational structures for paginations, I have been successfully looping:

baseUrl = "https://www.example.com/inventory/page="
outputDataframe = list()
i = 1
for pageNumber in range(1, 10):

url = baseUrl + str(pageNumber)
print(url)

page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

However, I have a csv of URLs to scrape which have uniform classes and attributes within the page content; however, the URLs themselves are unique and do not follow a pattern.

How do I get BeautifulSoup to loop through a csv efficiently?

Many thanks.

So far, I have had success with uniform URLs using a loop. However, I do not know how to import/call a csv or unique URLs and then perform the same function.

Sebastian von Rotz · Accepted Answer · 2022-03-22 17:47:30Z

1

For importing a csv I would work with pandas:

import pandas as pd 
df = pd.read_csv('URLs.csv', delimiter=',')

Then transform the dataframe column to list (I assume it only has one column):

urlList=list(df.iloc[:, 0])

After that simply iterate through the list:

for url in urlList:
   page = requests.get(url)
   soup = BeautifulSoup(page.content, "html.parser")

answered Mar 22, 2022 at 17:47

Sebastian von Rotz

434 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Looping through csv of URLs using BeautifulSoup

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related