1

I have a very basic python script that pulls from a text file of searches and returns the first URL from Google. I'm receiving an error when the google result contains a foreign character (such as montréal)

Ideally I'd like to include any character pulled regardless of language

import requests                   
from bs4 import BeautifulSoup

with open("searches.txt") as input:  # look at each line in our input file
    content = input.readlines()
content = [x.strip() for x in content]  # and strip of newline characters

print '---'  # some formatting so it looks nice in terminal and our output file
header = '<Query>, <Link>' + '\n' + '---------------' + '\n' 
output = open("links.txt", "w")  # open file we want to write to                                 
output.write(header)                                            

for x in content:  # for each line in our input file
    print x
    query = x  # search google for that query
    goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + query
    r = requests.get(goog_search)                                                                                                           
    soup = BeautifulSoup(r.text, "html.parser")  # parse so we just get the link
    link = soup.find('cite').text
    formatted = query + ', ' + link + '\n'  # more output formatting
    print query + ', ' + link
    output.write(formatted)

output.close()
print '---'

error I'm receiving: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 53: ordinal not in range(128)

6
  • Is there a specific reason you're using Python 2.7 and not Python 3? Commented Mar 3, 2017 at 18:10
  • See a similar question here: stackoverflow.com/questions/19833440/…. Basically, when you open a file, open with explicit utf-8 encoding, and when you write, do the same Commented Mar 3, 2017 at 20:32
  • @L3viathan I'm very new to python and my buddy just suggested I start with 2.7 Commented Mar 4, 2017 at 19:54
  • @j.kaplan I suggest otherwise, especially when you're doing things with text. Python 3 comes with Unicode strings by default, in most cases you won't have to worry about encodings anymore. Commented Mar 4, 2017 at 21:16
  • @L3viathan I did not know that, thanks for that! That sounds like it would solve my issue right away. Do you know what the script would be in python 3? Like I said, I'm very new to python Commented Mar 6, 2017 at 3:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.