Tweeted twitter.com/StackCodeReview/status/1244097077904322562

occurred Mar 29, 2020 at 3:00

added 36 characters in body

Source Link

edited Mar 28, 2020 at 20:50

Ben A

10.8k
5
40
103

My target website is https://www.indeed.com/Indeed. I tried to implement the scraper such that it is convenient for the end user. I am introducing my code in the readme file on my github repo.

Link to the README file - https://github.com/Viktor-stefanov/Jobs-Crawler/blob/master/README.md file.

Source Link

asked Mar 28, 2020 at 19:51

Viktor Stefanov

95
3

Web Scraping with Python

My target website is https://www.indeed.com/. I tried to implement the scraper such that it is convenient for the end user. I am introducing my code in the readme file on my github repo.

I am somewhat a beginner in programming so I am looking for guidance on things such as if the libraries that I used are approriate, the code itself and on generally making the script better.

Link to the README file - https://github.com/Viktor-stefanov/Jobs-Crawler/blob/master/README.md

import requests
from bs4 import BeautifulSoup

jobName = input('Enter your desired position: ').replace(' ', '-')
place = input("Enter the location for your desired work(City, state or zip): ")

URL = 'https://www.indeed.com/q-'+jobName+'-l-'+place.replace(' ', '-')+'-jobs.html'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
pages = soup.find(id='searchCountPages')
noPag = pages.text.split('of')[1].strip(' jobs').replace(',', '')

nmoPag = input(f"There are {noPag} number of pages. If you want to scrape all of them write 'Max' else write number of pages you wish to scrape: ")
if nmoPag == 'Max':
    nmoPag = noPag

for i in range(0, int(nmoPag)*10, 10):
    URL = 'https://www.indeed.com/jobs?q='+jobName+'&l='+place.replace(' ', '+')+'&start='+str(i)
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'html.parser')
    results = soup.find(id='resultsCol')
    listings = results.find_all('div', class_='result')

    for job in listings:
        jobT = job.find('a', class_='jobtitle')
        jobL = job.find('span', class_='location').text.strip()
        jobS = job.find('div', class_='summary').text.strip()
        link = jobT['href']
        if any(any(subs in s for s in (jobT.text.strip().lower(), jobS.lower())) for subs in (jobName.split('+')[0], jobName[1])):
            print('Your job in '+jobL+' as a '+ jobT.text.strip()+
                    '.\nHere is a quick summary of your job here: '+
                        jobS+'\nLink for more information and application for the job - https://indeed.com'+link, end='\n\n\n')

python web-scraping

Stack Exchange Network

Return to Question

Web Scraping with Python