Best way to update certain columns of table in SQL Server based on Pandas DataFrame?

Question

I have a table on SQL Server that looks like this, where each row has a unique combination of Event A and Event B.

`Global Rules Table

ID   Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
1      EA1       EB1           Rule            Anti-Rule            0.01
2      EA1       EB2           Rule            Rule                 0.95  
3      ...       ...           ...             ...                  ...

I have another table with a Foreign Key constraint to Global Rules Table called Local Rules Table.

I have a Pandas DataFrame that looks like this

      Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
        EA1       EB1           Rule            Rule                 0.85
        EA1       EB2           Rule            Rule                 0.95  
        ...       ...           ...             ...                  ...

Since I have this Foreign Key constraint between Local Rules and Global Rules tables I can't use df.to_sql('Global Rules',con,if_exists='replace').

The columns which I want to update in the database based on values in dataframe are Generated as and Generated with score, so what is the best way to only update those columns in database table based on the DataFrame I have? Is there some out of the box function or library which I don't know about?

Jason Cook · Accepted Answer · 2021-02-19 12:26:03Z

1

I haven't found a library to accomplish this. I started writing one myself to host on PyPi but haven't finished yet.

An inner join against an SQL temporary table works well in this case. It will only update a subset of columns in SQL and can be efficient for updating many records.

I assume you are using pyodbc for the connection to SQL server.

SQL Cursor

# quickly stream records into the temp table
cursor.fast_executemany = True

Create Temporary Table

# assuming your DataFrame also has the ID column to perform the SQL join
statement = "CREATE TABLE [#Update_Global Rules Table] (ID BIGINT PRIMARY KEY, [Generated as] VARCHAR(200), [Generated with score] FLOAT)"
cursor.execute(statement)

Insert DataFrame into a Temporary Table

# insert only the key and the updated values
subset = df[['ID','Generated as','Generated with score']]

# form SQL insert statement
columns = ", ".join(subset.columns)
values = '('+', '.join(['?']*len(subset.columns))+')'

# insert
statement = "INSERT INTO [#Update_Global Rules Table] ("+columns+") VALUES "+values
insert = [tuple(x) for x in subset.values]

cursor.executemany(statement, insert)

Update Values in Main Table from Temporary Table

statement = '''
UPDATE
     [Global Rules Table]
SET
     u.Name
FROM
     [Global Rules Table] AS t
INNER JOIN 
     [#Update_Global Rules Table] AS u 
ON
     u.ID=t.ID;
'''

cursor.execute(statement)

Drop Temporary Table

cursor.execute("DROP TABLE [#Update_Global Rules Table]")

answered Feb 19, 2021 at 12:26

Jason Cook

1,50913 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kspr Over a year ago

It throws an error pyodbc.DataError: ('22003', '[22003] [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range (0) (SQLExecute)'). Do you have to modify something in order to allow NaN values all columns except for id?

Jason Cook Over a year ago

NaN values, or any other missing values such as pd.NaT, first need to be changed to the standard Python None data type.

Jason Cook Over a year ago

The error you listed makes me think there may be another issue, although I haven't tested. It may be that the "Generated with score" column in SQL is defined as a decimal type but you are attempting to write a float to it. Essentially, in Python more decimal places are being generated than the SQL column can accept.

kspr Over a year ago

I had to replace the np.nan values with None values, that solved it.

Collectives™ on Stack Overflow

Best way to update certain columns of table in SQL Server based on Pandas DataFrame?

1 Answer 1

SQL Cursor

Create Temporary Table

Insert DataFrame into a Temporary Table

Update Values in Main Table from Temporary Table

Drop Temporary Table

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

SQL Cursor

Create Temporary Table

Insert DataFrame into a Temporary Table

Update Values in Main Table from Temporary Table

Drop Temporary Table

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related