Failing bulk insert data from Pandas dataframe into Sybase database table using to_sql

Question

The purpose of my below code is to get data from a restful service, normalize it, store it in dataframe with necessary columns and then finally load it in Sybase table using Pandas' to_sql.

Error :

File "C:\Program Files\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 467, in do_executemany cursor.executemany(statement, parameters) sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Sybase][ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near ','.\n (102) (SQLExecDirectW)") [SQL: 'INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") VALUES (?, ?, ?, ?)'] [parameters: (('0050/TAIEX', 'TAIEX', 'TWD', 0), ('035420/KORE', 'KORE', 'KRW', 0), ('0TL/LIF', 'LIF', 'NOK', 1), ('100FTSE/LIF', 'LIF', 'GBP', 0), ('101FTSE/LIF', 'LIF', 'GBP', 0), ('10STAT/OM', 'OM', 'SEK', 0), ('10TB/KFX', 'KFX', 'KRW', 0), ('10TBA/KFX', 'KFX', 'KRW', 0) ... displaying 10 of 4525 total bound parameter sets ... ('ZURF/DTB', 'DTB', 'CHF', 0), ('ZX/NYCE', 'NYCE', 'USD', 0))]

Process finished with exit code 1

Code :

from sqlalchemy.engine.url import *               
from sqlalchemy.connectors.pyodbc import *             
from sqlalchemy import create_engine                       
import urllib.request as request                  
import json                         
import pandas as pd                      
from pandas.io.json import json_normalize, DataFrame      
           
response = request.urlopen('http://tfsdscsw5XX/mdsclass/CONTFUTURES--O.json')            
output=response.read()                              
data=json.loads(output)           
df=json_normalize(data)                           
df1=(df[['CONTRACT_ID','EXCHANGE_ID','CURRENCY','TRADING_CODE']])                
df2=pd.DataFrame(df1)           
print(df2)                
print(df2.CONTRACT_ID)          
            
connector =  PyODBCConnector()                 
url = make_url("sybase+pyodbc://myhost/mydatabase?driver=Adaptive Server Enterprise&port=2306")              
print(connector.create_connect_args(url))                         
engine=create_engine(url)

#it is failing here**
df2.to_sql("contract_test",engine,index=False,if_exists="append",schema="dbo")   

response.close()

Sample of data in dataframe df2 :

      CONTRACT_ID EXCHANGE_ID CURRENCY  TRADING_CODE
0      0050/TAIEX       TAIEX      TWD             0
1     035420/KORE        KORE      KRW             0
2         0TL/LIF         LIF      NOK             1
3     100FTSE/LIF         LIF      GBP             0
4     101FTSE/LIF         LIF      GBP             0

Table contract_test definition :

CREATE TABLE contract_test (
    CONTRACT_ID char(12) NOT NULL,
    EXCHANGE_ID char(12),
    CURRENCY char(4) NOT NULL,
    TRADING_CODE smallint
) 
GO

Please help as to how can this be resolved ? I am stuck here.

Parfait · Accepted Answer · 2020-07-02 14:07:36Z

0

Your issue may simply be the incompatibilities of Python database APIs. Pandas' to_sql is really running an executemany() call from pyodbc. This module is more popularly used with SQL Server especially in implementation with SQLAlchemy. However, integration with Sybase is not fully supported. As mentioned on the SQLAlchemy Sybase docs page:

Note

The Sybase dialect within SQLAlchemy is not currently supported. It is not tested within continuous integration and is likely to have many issues and caveats not currently handled. Consider using the external dialect instead.

Specifically, executemany appears to be running multiple VALUES row inserts which is supported in SQL Server but not Sybase (even though both dialects are variants of TSQL with known history of connection):

INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0050/TAIEX', 'TAIEX', 'TWD', 0), 
       ('035420/KORE', 'KORE', 'KRW', 0), 
       ('0TL/LIF', 'LIF', 'NOK', 1), 
...

Instead, Sybase requires classic ANSI-SQL with multiple INSERT INTO calls:

INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0050/TAIEX', 'TAIEX', 'TWD', 0) 
INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('035420/KORE', 'KORE', 'KRW', 0)
INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0TL/LIF', 'LIF', 'NOK', 1)
...

To resolve, instead of Pandas' convenient to_sql method, consider a direct SQLAlchemy executemany call with parameters using list of data frame rows via DataFrame.to_numpy(). Below assumes contract_test table always exists beforehand.

engine = create_engine(url)
sql = """INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
         VALUES (?, ?, ?, ?)"""

with engine.connect() as connection:
    result = connection.execute(sql, df2.to_numpy().tolist())

If above still faces same issue, integrate a for-loop:

with engine.connect() as connection:
    for row in df2.to_numpy().tolist():
        result = connection.execute(sql, row)

edited Jul 2, 2020 at 14:07

answered Jul 1, 2020 at 14:41

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jmenghan Over a year ago

I have used the code as above suggested above, but getting below error : result = connection.executemany(sql, row) AttributeError: 'Connection' object has no attribute 'executemany' Process finished with exit code 1

Parfait Over a year ago

Try running with execute. See edit. Docs indicate the DB-API will detect by parameters (list/tuple of iterables or just single iterable) when to use executemany.

jmenghan Over a year ago

It now works with execute, but is super slow and taking 12 mins to insert 4500 rows in the sybase database table. How to resolve this performance issue ?

Parfait Over a year ago

Which version of execute (with or without for-loop)? Try a non-SQLAlchemy version with simple pyodbc connection. Use cursor.executemany.

jmenghan Over a year ago

It is still tremendously slow with pyodbc. I tried below combinations: 1) pyodbc : execute and executemany 2) sqlalhemy : execute and executemany . Have tried both 1 and 2 with and without for loop. It is taking 9 minutes to insert 4500 rows. :(

|

Gord Thompson · Accepted Answer · 2020-08-10 10:53:31Z

0

The external SAP ASE (Sybase) dialect is now the recommended SQLAlchemy dialect for Sybase, and it does support fast_executemany if you use the SAP ASE ODBC driver.

answered Aug 10, 2020 at 10:53

Gord Thompson

125k39 gold badges252 silver badges458 bronze badges

Collectives™ on Stack Overflow

Failing bulk insert data from Pandas dataframe into Sybase database table using to_sql

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related