1

I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:

def bulk_insert_users(self, users, allow_duplicates = False):
    if not allow_duplicates:
        users_new = []

        for user in users:
            if not self.SQL_IO.db.query(User_DB.id).filter_by(user_id = user.user_id).scalar():
                users_new.append(user)

        users = users_new

    self.SQL_IO.db.bulk_save_objects(users)
    self.SQL_IO.db.commit()

Can the above functionality be implemented such that the function is faster?

2 Answers 2

2

You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.

Sign up to request clarification or add additional context in comments.

1 Comment

I was wondering if there's some built-in SQLAlchemy functionality that essentially combines the steps of the process described above, but I didn't quite articulate that in the question, and reading over the sources, I'm finding nothing of the sort, so I guess I'll go with this approach.
1

How many users do you have? You're querying for the users one at a time, every single iteration of that loop. You might have more luck querying for ALL user Ids, put them in a list, then check against that list.

existing_users = #query for all user IDs
for user in new_users:
    if user not in existing_users:
        #do_stuff

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.