Context: I am using MSSQL, pandas, and pyodbc.
Steps:
- Obtain dataframe from query using pyodbc (no problemo)
- Process columns to generate the context of a new (but already existing) column
- Fill an auxilliary column with UPDATE statements (i.e.
UPDATE t SET t.value = df.value FROM dbo.table t where t.ID = df.ID)
Now how do I execute the sql code in the auxilliary column, without looping through each row?
sample data
The first two columns are obtained by querying dbo.table, the third columns exists but is empty in the database. The fourth column only exists in the dataframe to prepare the SQL statement that would correspond to updating dbo.table
| ID | raw | processed | strSQL |
|---|---|---|---|
| 1 | [email protected] | lorum ipsum | UPDATE t SET t.processed = 'lorum ipsum' FROM dbo.table t WHERE t.ID = 1 |
| 2 | [email protected] | rumlo sumip | UPDATE t SET t.processed = 'rumlo sumip' FROM dbo.table t WHERE t.ID = 2 |
| 3 | ... | ... | ... |
I would like to execute the SQL script in each row in an efficient manner.
[(processed, id),(processed, id),…](note the order of the columns) and then usecrsr.executemany("UPDATE t SET t.processed=? WHERE t.ID=?", my_list_of_tuples). Be sure to callcrsr.fast_executemany = Truefirst for best performance.update t ... from yourtable t join @yourTVP p on ...