Python New York Times Web Scaping Error("bytes to string")

Question

Here is the code which I got from the web, when I execute it, It says the following error, I am new to web scraping, so utterly confused about it. Can anyone tell me where my code went wrong? Thank you for your help!

from nytimesarticle import articleAPI
api = articleAPI('a0de895aa110431eb2344303c7105a9f')


articles = api.search( q = 'Obama', 
     fq = {'headline':'Obama', 'source':['Reuters','AP', 'The New York Times']}, 
     begin_date = 20111231 )


def parse_articles(articles):
    news = []
    for i in articles['response']['docs']:
        dic = {}
        dic['id'] = i['_id']
        if i['abstract'] is not None:
            dic['abstract'] = i['abstract'].encode("utf8")
        dic['headline'] = i['headline']['main'].encode("utf8")
        dic['desk'] = i['news_desk']
        dic['date'] = i['pub_date'][0:10] # cutting time of day.
        dic['section'] = i['section_name']
        if i['snippet'] is not None:
            dic['snippet'] = i['snippet'].encode("utf8")
        dic['source'] = i['source']
        dic['type'] = i['type_of_material']
        dic['url'] = i['web_url']
        dic['word_count'] = i['word_count']
        # locations
        locations = []
        for x in range(0,len(i['keywords'])):
            if 'glocations' in i['keywords'][x]['name']:
                locations.append(i['keywords'][x]['value'])
        dic['locations'] = locations
        # subject
        subjects = []
        for x in range(0,len(i['keywords'])):
            if 'subject' in i['keywords'][x]['name']:
                subjects.append(i['keywords'][x]['value'])
        dic['subjects'] = subjects   
        news.append(dic)
    return(news)

def get_articles(date,query):
    all_articles = []
    for i in range(0,100): #NYT limits pager to first 100 pages. But rarely will you find over 100 pages of results anyway.
        articles = api.search(q = query,
               fq = {'source':['Reuters','AP', 'The New York Times']},
               begin_date = date + '0101',
               end_date = date + '1231',
               sort='oldest',
               page = str(i))
        articles = parse_articles(articles)
        all_articles = all_articles + articles
    return(all_articles)

Amnesty_all = []
for i in range(1980,2014):
    print ('Processing' + str(i) + '...')
    Amnesty_year =  get_articles(str(i),'Amnesty International')
    Amnesty_all = Amnesty_all + Amnesty_year

import csv
keys = Amnesty_all[0].keys()
with open('amnesty-mentions.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(Amnesty_all)

This is the output when I run it on python 3.4:-

OUTPUT:

Traceback (most recent call last):
  File "/Users/niharika/Documents/nyt.py", line 7, in <module>
    begin_date = 20111231 )
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nytimesarticle.py", line 111, in search
    API_ROOT, response_format, self._options(**kwargs), key
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nytimesarticle.py", line 84, in _options
    v = _format_fq(v)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nytimesarticle.py", line 69, in _format_fq
    d[k] = '"' + v + '"'
TypeError: Can't convert 'bytes' object to str implicitly
>>>

source for code: http://dlab.berkeley.edu/blog/scraping-new-york-times-articles-python-tutorial

Possible duplicate of Python3 Error: TypeError: Can't convert 'bytes' object to str implicitly — Josh Lee
– Josh Lee, Commented Feb 13, 2017 at 14:46

Scott Hunter · Accepted Answer · 2017-02-13 14:45:14Z

0

The error is telling you to convert v (the bytes object) to a string explicitly.

answered Feb 13, 2017 at 14:45

Scott Hunter

50k12 gold badges65 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Rakesh Naga Chinta Over a year ago

Yes, please tell me how .

Scott Hunter Over a year ago

Because you spent zero effort to find out yourself?

Rakesh Naga Chinta Over a year ago

Harsh, but not quite true.

Rakesh Naga Chinta Over a year ago

I searched but found out that I needed to add unicode 'utf-8', I think.

Rakesh Naga Chinta Over a year ago

But don't know where to append it in the script

|

as - if · Accepted Answer · 2018-01-31 06:50:51Z

0

Basically i copied the code from NYTimesArticleAPI/NYTimesArticleAPI/search_api.py and replaced it with my installed nytimesarticle file nytimesarticle.py

Thus it removed

def _utf8_encode(self, d): ......

which prevented nytimesarticle module to work with python3, throwing TypeError:must be str,not bytes on search function of the api.

answered Jan 31, 2018 at 6:50

as - if

3,4372 gold badges23 silver badges32 bronze badges

Collectives™ on Stack Overflow

Python New York Times Web Scaping Error("bytes to string")

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related