0

I have a table on SQL Server that looks like this, where each row has a unique combination of Event A and Event B.

`Global Rules Table

ID   Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
1      EA1       EB1           Rule            Anti-Rule            0.01
2      EA1       EB2           Rule            Rule                 0.95  
3      ...       ...           ...             ...                  ...

I have another table with a Foreign Key constraint to Global Rules Table called Local Rules Table.

I have a Pandas DataFrame that looks like this

      Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
        EA1       EB1           Rule            Rule                 0.85
        EA1       EB2           Rule            Rule                 0.95  
        ...       ...           ...             ...                  ...

Since I have this Foreign Key constraint between Local Rules and Global Rules tables I can't use df.to_sql('Global Rules',con,if_exists='replace').

The columns which I want to update in the database based on values in dataframe are Generated as and Generated with score, so what is the best way to only update those columns in database table based on the DataFrame I have? Is there some out of the box function or library which I don't know about?

1 Answer 1

1

I haven't found a library to accomplish this. I started writing one myself to host on PyPi but haven't finished yet.

An inner join against an SQL temporary table works well in this case. It will only update a subset of columns in SQL and can be efficient for updating many records.

I assume you are using pyodbc for the connection to SQL server.

SQL Cursor

# quickly stream records into the temp table
cursor.fast_executemany = True

Create Temporary Table

# assuming your DataFrame also has the ID column to perform the SQL join
statement = "CREATE TABLE [#Update_Global Rules Table] (ID BIGINT PRIMARY KEY, [Generated as] VARCHAR(200), [Generated with score] FLOAT)"
cursor.execute(statement)

Insert DataFrame into a Temporary Table

# insert only the key and the updated values
subset = df[['ID','Generated as','Generated with score']]

# form SQL insert statement
columns = ", ".join(subset.columns)
values = '('+', '.join(['?']*len(subset.columns))+')'

# insert
statement = "INSERT INTO [#Update_Global Rules Table] ("+columns+") VALUES "+values
insert = [tuple(x) for x in subset.values]

cursor.executemany(statement, insert)

Update Values in Main Table from Temporary Table

statement = '''
UPDATE
     [Global Rules Table]
SET
     u.Name
FROM
     [Global Rules Table] AS t
INNER JOIN 
     [#Update_Global Rules Table] AS u 
ON
     u.ID=t.ID;
'''

cursor.execute(statement)

Drop Temporary Table

cursor.execute("DROP TABLE [#Update_Global Rules Table]")
Sign up to request clarification or add additional context in comments.

4 Comments

It throws an error pyodbc.DataError: ('22003', '[22003] [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range (0) (SQLExecute)'). Do you have to modify something in order to allow NaN values all columns except for id?
NaN values, or any other missing values such as pd.NaT, first need to be changed to the standard Python None data type.
The error you listed makes me think there may be another issue, although I haven't tested. It may be that the "Generated with score" column in SQL is defined as a decimal type but you are attempting to write a float to it. Essentially, in Python more decimal places are being generated than the SQL column can accept.
I had to replace the np.nan values with None values, that solved it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.