0

I am trying to scrape data from the PGA website to get a list of all the golf courses in the USA. I want to scrape the data and input into a CSV file. My problem is after running my script I get this error. Can anyone help fix this error and how I can go about extracting the data?

Here is the error message:

File "/Users/AGB/Final_PGA2.py", line 44, in
writer.writerow(row)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 35: ordinal not in range(128)

Script Below;

import csv
import requests 
from bs4 import BeautifulSoup

courses_list = []
for i in range(906):      # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content)

g_data2=soup.find_all("div",{"class":"views-field-nothing"})

for item in g_data2:
    try:
          name = item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
          print name
    except:
          name=''
    try:
          address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
    except:
          address1=''
    try:
          address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
    except:
          address2=''
    try:
          website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text
    except:
          website=''   
    try:
          Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text
    except:
          Phonenumber=''      

    course=[name,address1,address2,website,Phonenumber]

    courses_list.append(course)


with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)
4
  • Can you edit your post to display properly? If you indent the whole thing 4 spaces it'll show as a code block instead of unformatted text. Commented Jun 27, 2015 at 17:46
  • I edited the post, awaiting approval. Commented Jun 27, 2015 at 17:49
  • stackoverflow.com/questions/30551429/… Commented Jun 27, 2015 at 18:50
  • there are many issues in your code: 0. Indentation matters in Python, check requests.get(url) line 1. don't use bare except:, it may catch too much and hide bugs, add error logging for debugging 2. avoid the repetitive code, create a function instead. 3. Make sure you know what Python version you use: you should not see this error on Python 3 Commented Jun 28, 2015 at 10:18

2 Answers 2

1

You should not get the error on Python 3. Here's code example that fixes some unrelated issues in your code. It parses specified fields on a given web-page and saves them as csv:

#!/usr/bin/env python3
import csv
from urllib.request import urlopen
import bs4 # $ pip install beautifulsoup4

page = 905
url = ("http://www.pga.com/golf-courses/search?page=" + str(page) +
       "&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0"
       "&course_type=both&has_events=0")
with urlopen(url) as response:
    field_content = bs4.SoupStrainer('div', 'views-field-nothing')
    soup = bs4.BeautifulSoup(response, parse_only=field_content)

fields = [bs4.SoupStrainer('div', 'views-field-' + suffix)
          for suffix in ['title', 'address', 'city-state-zip', 'website', 'work-phone']]

def get_text(tag, default=''):
    return tag.get_text().strip() if tag is not None else default

with open('pga.csv', 'w', newline='') as output_file:
    writer = csv.writer(output_file)
    for div in soup.find_all(field_content):
        writer.writerow([get_text(div.find(field)) for field in fields])
Sign up to request clarification or add additional context in comments.

Comments

0
with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)

Change that to:

with open ('PGA_Final.csv','a') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row.encode('utf-8'))

Or:

import codecs
....
with codecs.open('PGA_Final.csv','a', encoding='utf-8') as file:
          writer=csv.writer(file)
          for row in courses_list:
               writer.writerow(row)

10 Comments

You can also use codecs.open, which works like the regular open but also accepts an encoding kwarg.
I added another solution with your suggestion.
AttributeError: 'list' object has no attribute 'encode' for the first option
Any suggestion on how I can fix that?
I had import codes instead of import codecs, that should fix the problem. The second error is because row is another list of its own, so you'll have to create a loop for it as well. I didn't know that since you didn't have data sample.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.