10

I'm wondering if it's possible to prevent committing duplicates to the database. For example, presume there is a class as follows

class Employee(Base):
   id = Column(Integer, primary_key=True)
   name = Column(String)

If I were to make a series of these objects,

employee1 = Employee(name='bob')
employee2 = Employee(name='bob')

session.add_all([employee1, employee2])
session.commit()

I would like only a single row to be added to the database, and employee1 and employee2 to point to the same object in memory (if possible).

Is there functionality within SQLAlchemy to accomplish this? Or would I need to ensure duplicates don't exist programmatically?

6
  • It sounds like you have the wrong primary key. You should make the name the primary key if that's what you want. Though then people can't really change names without incurring a huge multi-table database update as all the foreign keys everywhere have to change. Also, what happens if you try to change a name to an existing one? Would you just want to wipe out the row that's already there? Commented Jun 13, 2017 at 2:13
  • 3
    Creating a unique index by name, even if it's not the primary key, looks like the right way to enforce name uniqueness. Commented Jun 13, 2017 at 4:36
  • To prevent committing duplicates do what @9000 said. To have the session only create bob once in your second example, see the unique object recipes. Commented Jun 13, 2017 at 6:01
  • 1
    @nven - And how would the database resolve the issue other than by preventing the commit from happening? Commented Jun 13, 2017 at 13:39
  • 1
    Ideally, I would like the ORM to realize that there are duplicate entries, only commit one, and have all references to the duplicates reference the same object Commented Jun 13, 2017 at 19:29

3 Answers 3

5

An alternate get_or_create() solution:

from sqlalchemy.orm.exc import NoResultFound
# ...

def get_or_create(self, model, **kwargs):
    """
    Usage:
    class Employee(Base):
        __tablename__ = 'employee'
        id = Column(Integer, primary_key=True)
        name = Column(String, unique=True)

    get_or_create(Employee, name='bob')
    """
    instance = get_instance(model, **kwargs)
    if instance is None:
        instance = create_instance(model, **kwargs)
    return instance


def create_instance(model, **kwargs):
    """create instance"""
    try:
        instance = model(**kwargs)
        sess.add(instance)
        sess.flush()
    except Exception as msg:
        mtext = 'model:{}, args:{} => msg:{}'
        log.error(mtext.format(model, kwargs, msg))
        sess.rollback()
        raise(msg)
    return instance


def get_instance(self, model, **kwargs):
    """Return first instance found."""
    try:
        return sess.query(model).filter_by(**kwargs).first()
    except NoResultFound:
        return

Sign up to request clarification or add additional context in comments.

Comments

4

You could create a class method to get or create an Employee -- get it if it exists, otherwise create:

@classmethod
def get_or_create(cls, name):
    exists = db.session.query(Employee.id).filter_by(name=name).scalar() is not None
    if exists:
        return db.session.query(Employee).filter_by(name=name).first()
    return cls(name=name)


employee1 = Employee(name='bob')
db.session.add(employee1)
employee2 = Employee(name='bob')

employee1 == employee2  # False


bob1 = Employee.get_or_create(name='bob')
if bob1 not in db.session:
    db.session.add(bob1)

len(add_to_session) # 1

bob2 = Employee.get_or_create(name='bob')
if bob2 not in db.session:
    db.session.add(bob2)

len(add_to_session) # 1

bob1 == bob2  # True

3 Comments

credit to stackoverflow.com/a/32952421/4386191 for the quick exists expression.
This is interesting. I was hoping for a way to resolve this on commit. I presume bob1 would need to first be added to the session before get_or_create would return the bob1 object to the bob2 = Employee.get_or_create(name='bob') call? In the current workflow of my package, I first generate a large list of these types of objects and am not interacting with the database at all until commit
@nven oops, you're right that it has to be added to the session. I added this to the example and showed a pattern that might work for you. No commit needed, just add objects to the session after checking for uniqueness.
2

There are at least 2 approaches:

  • The database approach: create a relevant primary key; with SQLAlchemy you would define e.g. based on your minimalistic example name = Column('First Name', String(20), primary_key=True)
  • The coding approach: check whether the attribute, set of attributes already exists in the table, otherwise create it. See relevant code examples here.

In terms of performance, I believe the database approach is better. It also is the one which makes more sense.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.