UnicodeEncodeError: Scraping data using Python and beautifulsoup4

Question

I am trying to scrape data from the PGA website to get a list of all the golf courses in the USA. I want to scrape the data and input into a CSV file. My problem is after running my script I get this error. Can anyone help fix this error and how I can go about extracting the data?

Here is the error message:

File "/Users/AGB/Final_PGA2.py", line 44, in
writer.writerow(row)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 35: ordinal not in range(128)

Script Below;

import csv
import requests 
from bs4 import BeautifulSoup

courses_list = []
for i in range(906):      # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content)

g_data2=soup.find_all("div",{"class":"views-field-nothing"})

for item in g_data2:
    try:
          name = item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
          print name
    except:
          name=''
    try:
          address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
    except:
          address1=''
    try:
          address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
    except:
          address2=''
    try:
          website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text
    except:
          website=''   
    try:
          Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text
    except:
          Phonenumber=''      

    course=[name,address1,address2,website,Phonenumber]

    courses_list.append(course)


with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)

Can you edit your post to display properly? If you indent the whole thing 4 spaces it'll show as a code block instead of unformatted text. — Erin Call
– Erin Call, Commented Jun 27, 2015 at 17:46
there are many issues in your code: 0. Indentation matters in Python, check requests.get(url) line 1. don't use bare except:, it may catch too much and hide bugs, add error logging for debugging 2. avoid the repetitive code, create a function instead. 3. Make sure you know what Python version you use: you should not see this error on Python 3 — jfs
– jfs, Commented Jun 28, 2015 at 10:18

jfs · Accepted Answer · 2015-06-28 11:25:19Z

You should not get the error on Python 3. Here's code example that fixes some unrelated issues in your code. It parses specified fields on a given web-page and saves them as csv:

#!/usr/bin/env python3
import csv
from urllib.request import urlopen
import bs4 # $ pip install beautifulsoup4

page = 905
url = ("http://www.pga.com/golf-courses/search?page=" + str(page) +
       "&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0"
       "&course_type=both&has_events=0")
with urlopen(url) as response:
    field_content = bs4.SoupStrainer('div', 'views-field-nothing')
    soup = bs4.BeautifulSoup(response, parse_only=field_content)

fields = [bs4.SoupStrainer('div', 'views-field-' + suffix)
          for suffix in ['title', 'address', 'city-state-zip', 'website', 'work-phone']]

def get_text(tag, default=''):
    return tag.get_text().strip() if tag is not None else default

with open('pga.csv', 'w', newline='') as output_file:
    writer = csv.writer(output_file)
    for div in soup.find_all(field_content):
        writer.writerow([get_text(div.find(field)) for field in fields])

Leb · Accepted Answer · 2015-06-27 19:49:29Z

0

with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)

Change that to:

with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row.encode('utf-8'))

Or:

import codecs
....
with codecs.open('PGA_Final.csv','a', encoding='utf-8') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)

edited Jun 27, 2015 at 19:49

answered Jun 27, 2015 at 17:48

Leb

16k11 gold badges58 silver badges78 bronze badges

10 Comments

Erin Call Over a year ago

You can also use codecs.open, which works like the regular open but also accepts an encoding kwarg.

Leb Over a year ago

I added another solution with your suggestion.

Gonzalo68 Over a year ago

AttributeError: 'list' object has no attribute 'encode' for the first option

Gonzalo68 Over a year ago

Any suggestion on how I can fix that?

Leb Over a year ago

I had import codes instead of import codecs, that should fix the problem. The second error is because row is another list of its own, so you'll have to create a loop for it as well. I didn't know that since you didn't have data sample.

|

Collectives™ on Stack Overflow

UnicodeEncodeError: Scraping data using Python and beautifulsoup4

2 Answers 2

Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related