0

I'm not having any luck with the other question that is posted on this website. My goal for this program is to scrape job postings from Indeed.com. I'm running into an attribute error. I don't know why I'm receiving this error because I'm making sure the tags match between the HTML and Python. Can anyone help me with this?

Code:

import urllib.request as urllib
from bs4 import BeautifulSoup
import csv

# empty array for results
results = []

# initialize the Indeed URL to url string
url = 'https://www.indeed.com/jobs?q=software+developer&l=Phoenix,+AZ&jt=fulltime&explvl=entry_level'
soup = BeautifulSoup(urllib.urlopen(url).read(), 'html.parser')
results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})

for i in results:
    title = i.find('div', attrs={"class":"title"})
    print('\ntitle:', title.text.strip())

    salary = i.find('span', attrs={"class":"salaryText"})
    print('salary:', salary.text.strip())

    company = i.find('span', attrs={"class":"company"})
    print('company:', company.text.strip())

Error log:

Traceback (most recent call last): File "c:/Users/Scott/Desktop/code/ScrapingIndeed/index.py", line 16, in print('salary:', salary.text.strip())
Scott@DESKTOP-MS37V5T MINGW64 ~/Desktop/code
$ AttributeError: 'NoneType' object has no attribute 'text'

Code from indeed.com I'm trying to scrape:

<span class="salaryText">
$15 - $30 an hour</span>
2
  • Hello Scott, please provide minimal working example, we don't use screenshots here. Commented Jan 29, 2020 at 4:03
  • please provide full error log, plus in some of the scrapp field , you are not getting full data ie thy are empty , you need to handle that Commented Jan 29, 2020 at 4:09

1 Answer 1

0

The answer is relatively simple. You needed to look at the source of the HTML you were attempting to scrape.

Not all of the div entities had the salary information you were looking for. Because of that some of the searches you ran had returned what Python refers to as a None value entity. That cannot be printed, although you can manipulate that.

All you need to do to overcome that is check whether or not the value of the salary information is a printable value or not.

For example take a look at the code as modified below:

    salary = i.find('span', attrs={"class":"salaryText"})
    if salary is not None:
      print('salary:', salary.text)

The entire code is as follows:

import urllib.request as urllib
from bs4 import BeautifulSoup
import csv

# empty array for results
results = []

# initialize the Indeed URL to url string
url = 'https://www.indeed.com/jobs?q=software+developer&l=Phoenix,+AZ&jt=fulltime&explvl=entry_level'
soup = BeautifulSoup(urllib.urlopen(url).read(), 'html.parser')
results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})

for i in results:
    title = i.find('div', attrs={"class":"title"})
    print('\ntitle:', title.text.strip())

    salary = i.find('span', attrs={"class":"salaryText"})
    if salary is not None:
      print('salary:', salary.text)

    company = i.find('span', attrs={"class":"company"})
    print('company:', company.text.strip())
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.