0

I m trying to update and add new data frame a pandas data frame in a SQL DB.

I have 2 queries: one is import all datas in DF (more than 100.000) and compare it with sql table with this code:

df.to_sql(table_name, con=engine, if_exists='replace', index=False)

the second one is same import and query but in just import data in a specific period to data frame and import it in the same sql table. Code used is the same:

 df.to_sql(table_name, con=engine, if_exists='replace', index=False)

My issue is: when I used my second code, it erase all existing data in sql table which is not existing in my second code (partial import).

could someone give me advice ?

for info, ma database is on Azure

thanks and happy new year

1 Answer 1

10

The if_exists='replace' is not a row wise operation. So it does not check if each row already exists and only replaces that specific row. It checks if the whole table is already there, if it finds the table, it will drop the old table and insert your new one.

Quoted from the docs:

replace: Drop the table before inserting new values.

What I think you should do is use if_exists='append' and then check for duplicate rows and remove them. That would for now be the safest approach.

The method you are looking for is being worked on atm and is called upsert, this will only insert record which do not "clash", and you can prioritise the new or old records. See GitHub ticket

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.