1

I am leveraging BeautifulSoup to scrape websites in Python.

Where URLs have had rational structures for paginations, I have been successfully looping:

baseUrl = "https://www.example.com/inventory/page="
outputDataframe = list()
i = 1
for pageNumber in range(1, 10):

url = baseUrl + str(pageNumber)
print(url)

page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

However, I have a csv of URLs to scrape which have uniform classes and attributes within the page content; however, the URLs themselves are unique and do not follow a pattern.

How do I get BeautifulSoup to loop through a csv efficiently?

Many thanks.

So far, I have had success with uniform URLs using a loop. However, I do not know how to import/call a csv or unique URLs and then perform the same function.

1 Answer 1

1

For importing a csv I would work with pandas:

import pandas as pd 
df = pd.read_csv('URLs.csv', delimiter=',')

Then transform the dataframe column to list (I assume it only has one column):

urlList=list(df.iloc[:, 0])

After that simply iterate through the list:

for url in urlList:
   page = requests.get(url)
   soup = BeautifulSoup(page.content, "html.parser")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.