3

I'm making a project where I need to create a class instance which has methods which connect to a db and fetch data from it (I'm using SQLite as a backend). I've had some experience with flask-sqlalchemy, but I'm lost when it comes to pure SQLAlchemy. The concept is as follows: User creates an instance of DataSet, and passes a path to the database as an __init__ parameter. If the database already exists, I would like just to connect to it and do queries, if it doesn't, I want to create a new one using a model. But I can't understand how to do so.

Here's the DataSet code:

from os.path import normcase, split, join, isfile
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import errors
import trainset
import testset


class DataSet:
    def __init__(self, path_to_set, path_to_db, train_set=False, path_to_labels=None, label_dict=None,
                 custom_name=None):
        self.__path_to_set = path_to_set
        self.__label_dict = label_dict

        if custom_name is None:
            dbpath = join(path_to_db, 'train.db')
            if train_set is False:
                dbpath = join(path_to_db, 'test.db')
        else:
            dbpath = join(path_to_db, custom_name)
        if isfile(dbpath):
            self.__prepopulated = True
        else:
            self.__prepopulated = False
        self.__dbpath = dbpath

        if train_set is True and path_to_labels is None:
            raise errors.InsufficientData('labels', 'specified')
        if train_set is True and not isfile(path_to_labels):
            raise errors.InsufficientData('labels', 'found at specified path', path_to_labels)

    def prepopulate(self):
        engine = create_engine('sqlite:////' + self.__dbpath)
        self.__prepopulated = True

Here's the trainset code:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String, PickleType, Integer, MetaData

Base = declarative_base()
metadata = MetaData()


class TrainSet(Base):
    __tablename__ = 'train set'
    id = Column(Integer, primary_key=True)
    real_id = Column(String(60))
    path = Column(String(120))
    labels = Column(PickleType)
    features = Column(PickleType)

Here's the testset code:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String, PickleType, Integer, MetaData

Base = declarative_base()
metadata = MetaData()


class TestSet(Base):
    __tablename__ = 'test set'
    id = Column(Integer, primary_key=True)
    real_id = Column(String(60))
    path = Column(String(120))
    features = Column(PickleType)

So, if the user passes train_set=True when creating a DataSet instance, I would like to create a database using the TrainSet model, and create a TestSet database otherwise. I would like this to happen in the prepopulate method, however, I don't understand how to do it - the documentation calls for this: Base.metadata.create_all(engine), but I'm lost as to where to put this code.

1 Answer 1

2

First save the parameter train_set:

class DataSet:
    def __init__(self, path_to_set, path_to_db, train_set=False, path_to_labels=None, label_dict=None,
                 custom_name=None):
        self._train_set = train_set
        # ...

Then, use it in the prepopulate to create proper model(s):

def prepopulate(self):
    engine = create_engine('sqlite:////' + self.__dbpath)
    if self._train_set:
        trainset.Base.create_all(engine)
    else:
        testset.Base.create_all(engine)
    self.__prepopulated = True

One more thing: do not prefix your "private" variables with double-underscore. Please read PEP 8 -- Style Guide for Python Code for reference.

Sign up to request clarification or add additional context in comments.

2 Comments

Great, thanks! I only had to change testset.Base.create_all(engine) to testset.Base.metadata.create_all(engine) (same goes for train set), but it all works now. Also, since I will be writing a lot of functions which query stuff, is it a good idea to store the engine in the class instance? (Instead of creating it in each class method)?
I would not store engine in any of the model classes, as you will be mixing wrong things together and will get everything coupled in a wierd way (model should not know anything about the connection). Instead, in the class methods just use object_session(self) to get the session (a-la transaction) and query other data as required.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.