Speeding up Pandas to_sql()?

Question

I have a 1,000,000 x 50 Pandas DataFrame that I am currently writing to a SQL table using:

df.to_sql('my_table', con, index=False)

It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.

If I try the method in:

Bulk Insert A Pandas DataFrame Using SQLAlchemy

then I get a no attribute copy_from error.
If I try the multithreading method from:

http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html

then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out error.

Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?

I would use this or similar approach - BCP should be very fast — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Jan 9, 2017 at 19:06

Yuval Cohen Hadad · Accepted Answer · 2019-06-12 05:29:37Z

8

in pandas 0.24 you can use method ='multi' with chunk size of 1000 which is the sql server limit

chunksize=1000, method='multi'

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

New in version 0.24.0.

The parameter method controls the SQL insertion clause used. Possible values are:

None: Uses standard SQL INSERT clause (one per row). 'multi': Pass multiple values in a single INSERT clause. It uses a special SQL syntax not supported by all backends. This usually provides better performance for analytic databases like Presto and Redshift, but has worse performance for traditional SQL backend if the table contains many columns. For more information check the SQLAlchemy documention.

answered Jun 12, 2019 at 5:29

Yuval Cohen Hadad

811 silver badge1 bronze badge

Sign up to request clarification or add additional context in comments.

2 Comments

Nitin Over a year ago

I keep getting PrestoUserError when I try to_sql with a Presto connection. Could you please share any example of Pandas writing to Presto? I obtained the connection to Presto using prestodb.dbapi.connect API

Nitin Over a year ago

It is trying this query first SELECT name FROM sqlite_master WHERE type='table' AND name=?; and complains about ; at the end :-/

Babu Arunachalam · Accepted Answer · 2017-10-19 04:33:20Z

6

I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.

import ctds

conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))

edited Oct 19, 2017 at 4:33

answered Oct 17, 2017 at 21:20

Babu Arunachalam

2414 silver badges9 bronze badges

1 Comment

Caio Belfort Over a year ago

Using this every day and is fast, very fast!

rohit singh · Accepted Answer · 2018-12-19 14:40:29Z

3

even I had the same issue so I applied sqlalchemy with fast execute many.

from sqlalchemy import event, create_engine
engine = create_egine('connection_string_with_database')
@event.listens_for(engine, 'before_cursor_execute')
def plugin_bef_cursor_execute(conn, cursor, statement, params, context,executemany):
   if executemany:
       cursor.fast_executemany = True  # replace from execute many to fast_executemany.
       cursor.commit()

always make sure that the given function should be present after the engine variable and before cursor execute.

conn = engine.execute()
df.to_sql('table', con=conn, if_exists='append', index=False) # for reference go to the pandas to_sql documentation.

edited Dec 19, 2018 at 14:40

answered Dec 19, 2018 at 12:29

rohit singh

1591 silver badge5 bronze badges

1 Comment

Led Over a year ago

adding decorator the will cause the issue: ('HY090', '[HY090] [Microsoft][ODBC Driver Manager] Invalid string or buffer length (0) (SQLBindParameter)')

Collectives™ on Stack Overflow

Speeding up Pandas to_sql()?

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related