0

I am new to Python and am in the process of scraping a site to collect inventory information. The inventory items are spread across 6 pages on the site. The scraping went very smoothly and I was able to parse out all of the HTML elements I wanted to select.

I am now taking this to the next step and trying to export this into a csv file using the csv.writer included in Python 3. The script runs in my command line without any Syntax Errors popping up, but the csv file does not get created. I am wondering if there are any obvious issues with my script or something that I may have left out when attempting to put the parsed HTML elements into a csv.

Here is my code:

import requests
import csv
from bs4 import BeautifulSoup

main_used_page = 'https://www.necarconnection.com/used-vehicles/'
page = requests.get(main_used_page)
soup = BeautifulSoup(page.text,'html.parser')

def get_items(main_used_page,urls):
    main_site = 'https://www.necarconnection.com/'
    counter = 0
    for x in urls:
        site = requests.get(main_used_page + urls[counter])
        soup = BeautifulSoup(site.content,'html.parser')
        counter +=1
        for item in soup.find_all('li'):
            vehicle = item.find('div',class_='inventory-post')
            image = item.find('div',class_='vehicle-image')
            price = item.find('div',class_='price-top')
            vin = item.find_all('div',class_='vinstock')

            try:
                url = image.find('a')
                link = url.get('href')
                pic_link = url.img
                img_url = pic_link['src']
                if 'gif' in pic_link['src']:img_url = pic_link['data-src']

                landing = requests.get(main_site + link)
                souped = BeautifulSoup(landing_page.content,'html.parser')
                comment = ''




                for comments in souped.find_all('td',class_='results listview'):
                    com = comments.get_text()
                    comment += com



                with open('necc-december.csv','w',newline='') as csv_file:
                    fieldnames = ['CLASSIFICATION','TYPE','PRICE','VIN',
                          'INDEX','LINK','IMG','DESCRIPTION']
                    writer = csv.DictWriter(csv_file,fieldnames=fieldnames)
                    writer.writeheader()
                    writer.writerow({
                        'CLASSIFICATION':vehicle['data-make'],
                        'TYPE':vehicle['data-type'],
                        'PRICE':price,
                        'VIN':vin,
                        'INDEX':vehicle['data-location'],
                        'LINK':link,
                        'IMG':img_url,
                        'DESCRIPTION':comment})

            except TypeError: None
            except AttributeError: None
            except UnboundLocalError: None

urls = ['']
counter = 0
prev = 0

for x in range(100):

    site = requests.get(main_used_page + urls[counter])
    soup = BeautifulSoup(site.content,'html.parser')

    for button in soup.find_all('a',class_='pages'):
        if button['class'] == ['prev']:
            prev +=1

        if button['class'] == ['next']:
            next_url = button.get('href')

        if next_url not in urls:
            urls.append(next_url)
            counter +=1

        if prev - 1 > counter:break


get_items(main_used_page,urls)

Here is a screenshot of what happens after the script is being processed through the command line:

command line return

It takes a while for the script to run, so I know that the script is being read and processed. I am just unsure what is going wrong between that and actually making the csv file.

I hope this was helpful. Again, any tips or tricks on working with the Python 3 csv.writer would be super appreciated as I have tried multiple different variations.

4
  • you are writing the csv inside the loop so every pass you are overwriting the file. instead of writing try appending to the file. with open('necc-december.csv','a',newline='') Commented Dec 13, 2018 at 17:05
  • Thanks - would you recommend placing that section elsewhere in the script? I have tried un-indenting it and I receive syntax errors with "unexpected unindent" Commented Dec 13, 2018 at 17:15
  • What was the last change you made, since the last time it passed all of its tests. There error will be there. You are doing test-driven-development? If not then you need to debug it. Debugging is the hardest thing that you will do in programming, maybe the hardest thing you will ever do. To avoid debugging, do test-driven-development. Commented Dec 13, 2018 at 17:24
  • @Sabrina. i would store all the info in a dictionary, after you have all the data call the function write to CVS and write all the data to file Commented Dec 13, 2018 at 17:27

1 Answer 1

1

I found that your code that writes the csv works fine. Here it is in isolation

import csv

vehicle = {'data-make': 'Buick',
           'data-type': 'Sedan',
           'data-location': 'Bronx',
           }
price = '8000.00'
vin = '11040VDOD330C0D0D003'
link = 'https://www.necarconnection.com/someplace'
img_url = 'https://www.necarconnection.com/image/someimage'
comment = 'Fine Car'

with open('necc-december.csv','w',newline='') as csv_file:
    fieldnames = ['CLASSIFICATION','TYPE','PRICE','VIN',
                  'INDEX','LINK','IMG','DESCRIPTION']
    writer = csv.DictWriter(csv_file,fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({
        'CLASSIFICATION':vehicle['data-make'],
        'TYPE':vehicle['data-type'],
        'PRICE':price,
        'VIN':vin,
        'INDEX':vehicle['data-location'],
        'LINK':link,
        'IMG':img_url,
        'DESCRIPTION':comment})

It creates the necc-december.csv fine:

CLASSIFICATION,TYPE,PRICE,VIN,INDEX,LINK,IMG,DESCRIPTION
Buick,Sedan,8000.00,11040VDOD330C0D0D003,Bronx,https://www.necarconnection.com/someplace,https://www.necarconnection.com/image/someimage,Fine Car

I think the problem is the code is not finding any buttons with class='next'

To run your code I had to initialize next_url

next_url = None

And then change your condition from

if next_url not in urls:

to

If next_url and next_url not in urls:

I added debug inside your for loop:

for button in soup.find_all('a',class_='pages'):
    print ('button:', button)

And got this output:

button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>
button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>

So there were no buttons with class = 'next'.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.