0

We have a few functionalities in our Flask web-application that consist of a single function call that calls a lot of sub-functions and does a lot of stuff behind the scenes. For example, it adds (financial) transactions to the (MSSQL) database, writes stuff in a log-table in the database and changes properties of specific objects, resulting in changed columns in specific tables in our database. All of this is done using SQLAlchemy through objects.

In a new approach, because of testability, and because we sometimes want to only display these changes without actually committing them to the database, we have these functions return a compound Python object that contains all of the changed objects. So, instead of committing database changes inside the function and sub-functions, we have them return the changed objects, so we can decide to show or save them outside of the main function.

So the main function returns a compound object with all of these changed objects included and outside of the main function we add these changed objects to our SQLAlchemy session and commit the session to the database. (or, if we just need to display information, we don't add and commit). The way we do this is the compound result object has a save_to_session() function that saves our changed objects with SQLAlchemy's bulk_save_objects() operation:

if result:
    result.save_to_session(current_app.db_session)
    current_app.db_session.commit()

def save_to_session(self, session):
    session.bulk_save_objects(self.adminlog)
    ...

This new approach has led to an error we didn't expect in the current_app.db_session.commit() line. It seems that at the end of the process, when we add the returned objects to the session and we try to commit the session to the database, an error occurs about a duplicate key. It looks like during the process, the returned objects were already added to the session somewhere, and SQLAlchemy tries to add them twice.

We have come to this conclusion, because when we comment out the bulk_save_objects() call, there is no longer an error message. However, the changed data is committed to the database correctly, and precisely once.

When we inspect the database after this error occurs, there are no records with the Primary key mentioned in the error message. This is because of the rollback that occurs on error. So it's not that the record already existed in the database either, but more like the session tries to add the same record twice.

This is the error we get, using pymssql as a driver:

sqlalchemy.exc.IntegrityError: (pymssql.IntegrityError) (2627, 
b"Violation of PRIMARY KEY constraint 'PK_adminlog_id'. 
Cannot insert duplicate key in object 'dbo.adminlog'. 
The duplicate key value is (0E5537FF-E45C-40C5-98FC-7B1ACAD8104E).
DB-Lib error message 20018, severity 14:\n
General SQL Server error: Check messages from the SQL Server\n
") 
[SQL: 
'INSERT INTO adminlog (
    alog_id, 
    alog_ppl_id, 
    alog_user_ppl_id, 
    alog_user_name, 
    alog_datetime, 
    [alog_ipAddress], 
    [alog_macAddress], 
    alog_comment, 
    alog_type, 
    alog_act_id, 
    alog_comp_id, 
    alog_artc_id) 
VALUES (
    %(alog_id)s, 
    %(alog_ppl_id)s, 
    %(alog_user_ppl_id)s, 
    %(alog_user_name)s, 
    %(alog_datetime)s, 
    %(alog_ipAddress)s, 
    %(alog_macAddress)s, 
    %(alog_comment)s, 
    %(alog_type)s, 
    %(alog_act_id)s, 
    %(alog_comp_id)s, 
    %(alog_artc_id)s)'] 

[parameters: (
    {'alog_act_id': None, 
    'alog_comment': 'Le service a été ajouté. Cours Coll (119,88)', 
    'alog_datetime': datetime.datetime(2018, 10, 29, 13, 46, 54, 837178), 
    'alog_macAddress': b'4A-NO-NY-MO-US', 
    'alog_type': b'user', 
    'alog_artc_id': None, 
    'alog_comp_id': None, 
    'alog_id': b'0E5537FF-E45C-40C5-98FC-7B1ACAD8104E', 
    'alog_user_ppl_id': b'99999999-9999-9999-1111-999999999999', 
    'alog_user_name': 'System', 
    'alog_ipAddress': b'0.0.0.0', 
    'alog_ppl_id': b'AE841D1C-5D8D-47F7-B81F-89C5C931BD14'}, 

    {'alog_act_id': None, 
    'alog_comment': 'Le service a été supprimé. 
    01/12/2019 Cours Coll (119,88)', 
    'alog_datetime': datetime.datetime(2018, 10, 29, 13, 46, 55, 71600), 
    'alog_macAddress': b'4A-NO-NY-MO-US', 
    'alog_type': b'user', 
    'alog_artc_id': None, 
    'alog_comp_id': None, 
    'alog_id': b'E22176FB-7490-470F-A8BA-A35D5F55A96A', 
    'alog_user_ppl_id': b'99999999-9999-9999-1111-999999999999', 
    'alog_user_name': 'System', 
    'alog_ipAddress': b'0.0.0.0', 
    'alog_ppl_id': b'AE841D1C-5D8D-47F7-B81F-89C5C931BD14'}
    )]

We get a similar error using PyODBC:

sqlalchemy.exc.IntegrityError: (pyodbc.IntegrityError) ('23000', 
"[23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Violation of PRIMARY KEY constraint 'PK_adminlog_id'. 
Cannot insert duplicate key in object 'dbo.adminlog'. 
The duplicate key value is (F5CABD8F-E000-4677-8F5F-78B4CD3B9560). (2627) (SQLExecDirectW); 
[23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The statement has been terminated. (3621)") 
[SQL: 'INSERT INTO adminlog (
        alog_id, 
        alog_ppl_id, 
        alog_user_ppl_id, 
        alog_user_name, 
        alog_datetime, 
        [alog_ipAddress], 
        [alog_macAddress], 
        alog_comment, 
        alog_type, 
        alog_act_id, 
        alog_comp_id, 
        alog_artc_id) 
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] 

        [parameters: ((
        b'F5CABD8F-E000-4677-8F5F-78B4CD3B9560', 
        b'0D10D3EF-F37E-45BE-8EED-B5987AE80732', 
        b'99999999-9999-9999-1111-999999999999', 
        'System', 
        datetime.datetime(2018, 10, 29, 13, 51, 30, 555495), 
        b'0.0.0.0', 
        b'4A-NO-NY-MO-US', 
        'Le service a été ajouté. Cours Coll (119,88)', 
        b'user', 
        None, 
        None, 
        None), 
        (
        b'39395ACA-0AFB-4C5F-90D4-0C6F95D7B8BC', 
        b'0D10D3EF-F37E-45BE-8EED-B5987AE80732', 
        b'99999999-9999-9999-1111-999999999999', 
        'System', 
        datetime.datetime(2018, 10, 29, 13, 51, 30, 777909), 
        b'0.0.0.0', 
        b'4A-NO-NY-MO-US', 
        'Le service a été supprimé. 01/12/2019 Cours Coll (119,88)', 
        b'user', 
        None, 
        None, 
        None)
        )]

My question is, is there an automatic process that adds (changed) objects to the session, without us using session.add()? Is there an option in SQLAlchemy to disable this behaviour and only commit to the session when it's explicitly done using session.add(object)?

8
  • There is at least one automatic "process": SQLAlchemy's ORM relationship cascades. If you assign/append a transient object to a relationship property having default configuration, it will add that object to the session. Commented Oct 30, 2018 at 11:43
  • @IljaEverilä Thanks for your comment. In an earlier question, you stated that new object instances, with a relation to an existing object would not be added to the session (this question: stackoverflow.com/questions/40745175/…). I'm confused now... Commented Oct 30, 2018 at 15:27
  • 1
    The linked question is different and you've misread a bit: in it the foreign key is assigned a value and it was expected that that would then also populate the ORM relationship attribute, which it doesn't by default. In that question the new instance really wasn't added to the session at first, and it could not have done any loading before it was added manually (or implicitly, through cascades...). Commented Oct 30, 2018 at 16:33
  • Please read about save-update cascade to clear any confusion. Commented Oct 30, 2018 at 16:41
  • 1
    Or you relate a transient object through an ORM relationship with an object that is already in a session (an instance from a query, for example): "save-update cascade also affects attribute operations for objects that are already present in a Session." I'm not saying this is happening in your case, but it is something to look for. Commented Oct 31, 2018 at 14:16

1 Answer 1

1

My question is, is there an automatic process that adds (changed) objects to the session, without us using session.add()?

There is at least one feature that pulls objects to a Session without explicitly adding them: save-update cascade. When an object is added to a Session all objects associated with it through relationship() attributes that have this cascade configured are placed in the Session as well. The same happens also when an object is associated with another that is already in a Session.

Is there an option in SQLAlchemy to disable this behaviour and only commit to the session when it's explicitly done using session.add(object)?

You can of course configure the relationship() attributes to not include this behaviour, but there does not seem to be a global switch that would disable cascades altogether.

If this is the case in your code, then the reason why the objects are being added twice is that you've somewhat explicitly done so. The bulk operations omit most of the more advanced features of Session in favour of raw performance – for example they do not coordinate with the Session if an object has already been persisted, nor do they attach persisted objects to the Session:

The objects as given have no defined relationship to the target Session, even when the operation is complete, meaning there’s no overhead in attaching them or managing their state in terms of the identity map or session.

As to why the problem arises in the first place, you should not need to manually keep a "staging area" – your compound object – around for objects. That is exactly what the Session is for, combined with proper use of transactions. The functions and subfunctions should add objects to the Session when it makes sense, but they should not control the ongoing transaction. That should happen only outside your main function, where you are now handling your compound object. If you rollback, all the changes go away.

In testing you can pass around a Session that has joined an external transaction that will be explicitly rollbacked, no matter what the code under test does.

Sign up to request clarification or add additional context in comments.

1 Comment

It seems that this behaviour is by design, but troublesome in specific situations, like when we want to use the same logic to process and to display a result. When displaying we don't want to actually save the objects to the database, but underlying code might still inadvertently save the entire session to the database (for example, a log message to the database might commit the session). We are going to try to solve this by creating a secondary connection that saves independently. We use this for logging and other things we do want to save. This answer is correct though, so it is accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.