0

The current script allows me to scrape only a single page but i would like to scarpe all 5 pages from the source url. How can i loop/iterate through remaining 4 pages?

#Import Libraries
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get('https://www.sustainalytics.com/esg-ratings/?industry=Aerospace%20&%20Defense&currentpage=1').text
soup = BeautifulSoup(source, 'lxml')

#Start CSV
csv_file = open('aerospacedata_1.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['company_name', 'company_exchange', 'company_risk'])

#Scrape Data from Web and write to csv
for company_info in soup.find_all(class_='company-row d-flex'):
    company_name = company_info.a.text
    company_exchange = company_info.find("small").text
    company_risk = company_info.find("div", class_="col-2").text
    print(company_name, company_exchange,company_risk)
    csv_writer.writerow([company_name, company_exchange, company_risk])
csv_file.close()

Output:

company_name company_exchange company_risk

AECC Aviation Power Co Ltd SHG:600893 53.3

Airbus SE PAR:AIR 30.3

Aselsan Elektronik Sanayi ve Ticaret AS IST:ASELS 31.6

AVIC Aircraft Co., Ltd. SHE:000768 54.4

AVIC Shenyang Aircraft Co. Ltd. SHG:600760 51.3

AviChina Industry & Technology Company Limited HKG:2357 45.2

BAE Systems PLC LON:BA 34.3

Bombardier Inc. TSE:BBD.B 30

BWX Technologies, Inc. NYS:BWXT 42.3

CAE Inc. TSE:CAE 32.4

1 Answer 1

1

Put a for loop and use the loop invariable to construct the url and the file name

#Import Libraries
from bs4 import BeautifulSoup
import requests
import csv

pages = 5
for i in range(1, pages+1):
    print(f"Page - {i}")
    source = requests.get(f'https://www.sustainalytics.com/esg-ratings/?industry=Aerospace%20&%20Defense&currentpage={i}').text
    soup = BeautifulSoup(source, 'lxml')

    #Start CSV
    csv_file = open(f'aerospacedata_{i}.csv', 'w')
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(['company_name', 'company_exchange', 'company_risk'])

    #Scrape Data from Web and write to csv
    for company_info in soup.find_all(class_='company-row d-flex'):
        company_name = company_info.a.text
        company_exchange = company_info.find("small").text
        company_risk = company_info.find("div", class_="col-2").text
        print(company_name, company_exchange,company_risk)
        csv_writer.writerow([company_name, company_exchange, company_risk])
    csv_file.close()

    print("---" * 30)

Output:

Page - 1
AECC Aviation Power Co Ltd SHG:600893 53.3
Airbus SE PAR:AIR 30.3
Aselsan Elektronik Sanayi ve Ticaret AS IST:ASELS 31.6
AVIC Aircraft Co., Ltd. SHE:000768 54.4
AVIC Shenyang Aircraft Co. Ltd. SHG:600760 51.3
AviChina Industry & Technology Company Limited HKG:2357 45.2
BAE Systems PLC LON:BA 34.3
Bombardier Inc. TSE:BBD.B 30
BWX Technologies, Inc. NYS:BWXT 42.3
CAE Inc. TSE:CAE 32.4
------------------------------------------------------------------------------------------
Page - 2
China Avionics Systems Co.,Ltd. SHG:600372 54.8
Cobham PLC LON:COB 34.7
Curtiss-Wright Corp NYS:CW 39
Dassault Aviation S.A. PAR:AM 31.8
Embraer S.A. BSP:EMBR3 36.3
FACC AG WBO:FACC 37.9
General Dynamics Corp NYS:GD 37.5
Heico Corp NYS:HEI 39.3
Hexcel Corp NYS:HXL 31.6
Huntington Ingalls Industries, Inc. NYS:HII 41.3
------------------------------------------------------------------------------------------
Page - 3
Kongsberg Gruppen ASA OSL:KOG 29
Korea Aerospace Industries, Ltd. KRX:047810 49.9
L3Harris Technologies, Inc. NYS:LHX 38.8
Leonardo S.p.a. MIL:LDO 28.7
Lockheed Martin Corp NYS:LMT 30.6
Macquarie Infrastructure Corp NYS:MIC 44.7
Meggitt PLC LON:MGGT 32.7
MTU Aero Engines AG ETR:MTX 23.8
Northrop Grumman Corp. NYS:NOC 31.1
QinetiQ Group PLC LON:QQ 23
------------------------------------------------------------------------------------------
Page - 4
Raytheon Co NYS:RTN 32.9
Rheinmetall AG ETR:RHM 35.4
Rolls-Royce Holdings PLC LON:RR 28.6
Saab AB OME:SAAB.B 31.5
Safran SA PAR:SAF 30.7
Senior PLC LON:SNR 31.9
Signature Aviation Plc LON:SIG 35.4
Singapore Technologies Engineering Ltd. SES:S63 29.2
Spirit AeroSystems Holdings Inc NYS:SPR 36.8
Teledyne Technologies, Inc. NYS:TDY 37.5
------------------------------------------------------------------------------------------
Page - 5
Textron Inc. NYS:TXT 37.8
Thales SA PAR:HO 28.6
The Boeing Company NYS:BA 39
TransDigm Group Inc NYS:TDG 40.9
Ultra Electronics Holdings PLC LON:ULE 37.4
United Technologies Corp NYS:UTX 29.3
------------------------------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

1 Comment

That works perfectly, Thank you so much for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.