1

I want to insert my scraped data directly into PostgreSQL db, I'm struggling with writing query for this, any help would be appreciated.

The code I've came up with so far:

import csv
import urllib.request
from bs4 import BeautifulSoup
conn = psycopg2.connect(database='--',user='--', password='--', port=--)
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen("http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(),'lxml')
tbody = soup('table' ,{"class":"tollinfotbl"})[0].find_all('tr')
for row in tbody:
    cols = row.findChildren(recursive=False)
    cols = [ele.text.strip() for ele in cols]
    writer.writerow(cols)
    print(cols)

My table's details are as follows:

    Column     |  Type   | Modifiers
---------------+---------+-----------
 vehicle_type  | text    | not null
 one_time      | integer | not null
 return_trip   | integer |
 monthly_pass  | integer | not null
 local_vehicle | integer | not null

1 Answer 1

1

I assume that cols contains 5 elements, in order which you presented at your table, otherwise adjust indexes.

import csv
import urllib.request

from bs4 import BeautifulSoup

conn = psycopg2.connect(database='--', user='--', password='--', port='--')
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen(
    "http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(), 'lxml')
tbody = soup('table', {"class": "tollinfotbl"})[0].find_all('tr')
for row in tbody:
    cols = row.findChildren(recursive=False)
    cols = [ele.text.strip() for ele in cols]
    if cols:
        vehicle_type = cols[0]
        one_time = int(cols[1])
        return_strip = int(cols[2])
        monthly_pass = int(cols[3])
        local_vehicle = int(cols[4])

        query = "INSERT INTO table_name (vehicle_type, return_strip, monthly_pass, local_vehicle) VALUES (%s, %s, %s, %s, %s);"
        data = (vehicle_type, one_time, return_strip, monthly_pass, local_vehicle)
        cursor.execute(query, data)

# Commit the transaction
conn.commit()
Sign up to request clarification or add additional context in comments.

8 Comments

Hi, i am facing this issue vehicle_type = cols[0] IndexError: list index out of range
It means that your cols are empty and you haven't scrapped anything.
but when i was exporting in csv file all scraped data coming fine , here is the detail - Type of vehicle Single Journey Return Journey Monthly Pass Commercial Vehicle Registered within the district of plaza Car/Jeep/Van 45 70 1565 25 LCV 75 115 2525 40 Bus/Truck 160 240 5290 80 Upto 3 Axle Vehicle 175 260 5770 85 4 to 6 Axle 250 375 8295 125 HCM/EME 250 375 8295 125 7 or more Axle 305 455 10100 150
here each time its taking empty [ ] , I tried to strip that using .strip("[ ]") but its not working ,
this is which i find when debugging >>> for row in tbody: ... cols = row.findChildren(recursive=False) ... [ele.text.strip() for ele in cols] ... ['Type of vehicle', 'Single Journey', 'Return Journey', 'Monthly Pass', 'Commercial Vehicle Registered within the district of plaza'] ['Car/Jeep/Van', '45.00', '70.00', '1565.00', '25.00'] [] ['LCV', '75.00', '115.00', '2525.00', '40.00'] [] ['Bus/Truck', '160.00', '240.00', '5290.00', '80.00'] [] ['Upto 3 Axle Vehicle', '175.00', '260.00', '5770.00', '85.00'] []
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.