1

I am trying to store my scraped data with scrapy to a SQL database but my code does not send anything while no error is mentioned when runned.

I am using my sql connector since I don't manage to install MySQL-python. My SQL database seems to running well and when I run the code the traffic KB/s raise. Please find below my pipelines.py code.

import mysql.connector
from mysql.connector import errorcode

class CleaningPipeline(object):
    ...

class DatabasePipeline(object):

    def _init_(self):
        self.create_connection()
        self.create_table()

    def create_connection(self):
        self.conn = mysql.connector.connect(
            host = 'localhost',
            user = 'root',
            passwd = '********',
            database = 'lecturesinparis_db'
        )
        self.curr = self.conn.cursor()

    def create_table(self):
        self.curr.execute("""DROP TABLE IF EXISTS mdl""")
        self.curr.execute("""create table mdl(
                        title text,
                        location text,
                        startdatetime text,
                        lenght text,
                        description text,
                        )""")

    def process_item(self, item, spider):
        self.store_db(item)
        return item

    def store_db(self, item):
        self.curr.execute("""insert into mdl values (%s,%s,%s,%s,%s)""", (
            item['title'][0],
            item['location'][0],
            item['startdatetime'][0],
            item['lenght'][0],
            item['description'][0],
        ))
        self.conn.commit()

1 Answer 1

2

You need to add the class in ITEM_PIPELINES first to let the scrapy know i want to use this pipeline.

In your settings.py file Update the lines below with your class name as following.

# https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    'projectname.pipelines.CleaningPipeline': 700,
    'projectname.pipelines.DatabasePipeline': 800,
}

The numbers 700 and 800 shows in which order the pipelines will process data, it can be any integer between 1-1000. Pipelines will process items in the order based by this number, so pipeline with 700 would process data before the pipeline with 800.

Note: Replace the projectname in 'projectname.pipelines.CleaningPipeline' with your actual projectname.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.