pandas dataframe to_sql for replace and add new using sqlalchemy

Question

I m trying to update and add new data frame a pandas data frame in a SQL DB.

I have 2 queries: one is import all datas in DF (more than 100.000) and compare it with sql table with this code:

df.to_sql(table_name, con=engine, if_exists='replace', index=False)

the second one is same import and query but in just import data in a specific period to data frame and import it in the same sql table. Code used is the same:

 df.to_sql(table_name, con=engine, if_exists='replace', index=False)

My issue is: when I used my second code, it erase all existing data in sql table which is not existing in my second code (partial import).

could someone give me advice ?

for info, ma database is on Azure

thanks and happy new year

Erfan · Accepted Answer · 2020-01-01 19:51:02Z

10

The if_exists='replace' is not a row wise operation. So it does not check if each row already exists and only replaces that specific row. It checks if the whole table is already there, if it finds the table, it will drop the old table and insert your new one.

Quoted from the docs:

replace: Drop the table before inserting new values.

What I think you should do is use if_exists='append' and then check for duplicate rows and remove them. That would for now be the safest approach.

The method you are looking for is being worked on atm and is called upsert, this will only insert record which do not "clash", and you can prioritise the new or old records. See GitHub ticket

edited Jan 1, 2020 at 19:51

answered Jan 1, 2020 at 19:45

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas dataframe to_sql for replace and add new using sqlalchemy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related