Faster way to perform bulk insert, while avoiding duplicates, with SQLAlchemy

Question

I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:

def bulk_insert_users(self, users, allow_duplicates = False):
    if not allow_duplicates:
        users_new = []

        for user in users:
            if not self.SQL_IO.db.query(User_DB.id).filter_by(user_id = user.user_id).scalar():
                users_new.append(user)

        users = users_new

    self.SQL_IO.db.bulk_save_objects(users)
    self.SQL_IO.db.commit()

Can the above functionality be implemented such that the function is faster?

ThiefMaster · Accepted Answer · 2016-10-06 22:42:22Z

2

You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.

answered Oct 6, 2016 at 22:42

ThiefMaster

320k85 gold badges608 silver badges648 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Boa Over a year ago

I was wondering if there's some built-in SQLAlchemy functionality that essentially combines the steps of the process described above, but I didn't quite articulate that in the question, and reading over the sources, I'm finding nothing of the sort, so I guess I'll go with this approach.

sytech · Accepted Answer · 2016-10-06 22:43:59Z

1

How many users do you have? You're querying for the users one at a time, every single iteration of that loop. You might have more luck querying for ALL user Ids, put them in a list, then check against that list.

existing_users = #query for all user IDs
for user in new_users:
    if user not in existing_users:
        #do_stuff

answered Oct 6, 2016 at 22:43

sytech

42.7k8 gold badges77 silver badges127 bronze badges

Collectives™ on Stack Overflow

Faster way to perform bulk insert, while avoiding duplicates, with SQLAlchemy

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related