Progress bar for pandas.DataFrame.to_sql

Question

I want to migrate data from a large csv file to sqlite3 database.

My code on Python 3.5 using pandas:

con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)

Is it possible to print current status (progress bar) of execution of to_sql method?

I looked the article about tqdm, but didn't find how to do this.

miraculixx · Accepted Answer · 2020-08-24 14:46:59Z

38

Unfortuantely DataFrame.to_sql does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

import sqlite3
import pandas as pd
from tqdm import tqdm

DB_FILENAME='/tmp/test.sqlite'

def chunker(seq, size):
    # from http://stackoverflow.com/a/434328
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def insert_with_progress(df, dbfile):
    con = sqlite3.connect(dbfile)
    chunksize = int(len(df) / 10) # 10%
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
            pbar.update(chunksize)
            
df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

The result is quite stunning:

edited Aug 24, 2020 at 14:46

answered Sep 14, 2016 at 16:15

miraculixx

10.4k2 gold badges43 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andrei Over a year ago

my csv file takes 1.7 GB place on the disk, so df=pd.read_csv(csv_filename, ...) works very slow. But I found the solution here: stackoverflow.com/a/28371706/5856795, so your answer and answer @sebastian-raschka help me to do this task.

Reinier Over a year ago

With range() in stead of xrange() this also works in Python 3. Very nicely, I must say!

elevendollar · Accepted Answer · 2022-07-22 20:17:09Z

7

I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:

#these need to be customized - myDataFrame, myDBEngine, myDBTable

df=myDataFrame

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def insert_with_progress(df):
    con = myDBEngine.connect()
    chunksize = int(len(df) / 10)
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(name="myDBTable", con=conn, if_exists=replace, index=False) 
            pbar.update(chunksize)
            tqdm._instances.clear()

insert_with_progress(df)

edited Jul 22, 2022 at 20:17

elevendollar

1,20411 silver badges19 bronze badges

answered Nov 4, 2019 at 18:03

Chris V

1052 silver badges7 bronze badges

1 Comment

Maximilian Peters Over a year ago

You defined the variable replace but don't use it. Did you mean if_exists=replace?

tok · Accepted Answer · 2020-04-08 16:54:46Z

0

User miraculixx has a nice example above, thank you for that. But if you want to use it with files of all sizes you should add something like this:

chunksize = int(len(df) / 10)
if chunksize == 0:
    df.to_sql(con=con, name="MLS", if_exists="replace", index=False)
else:
    with tqdm(total=len(df)) as pbar:
    ...

answered Apr 8, 2020 at 16:54

tok

9811 gold badge11 silver badges22 bronze badges

1 Comment

Pysnek313 Over a year ago

Is there anyway you could finish the example you posted above? When I set the integer in the chunksize variable, I only get that amount into my db. e.g., chunksize = int(len(df) / 10) then only 1/10 of the total records are being recorded into my db.

Collectives™ on Stack Overflow

Progress bar for pandas.DataFrame.to_sql

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related