-1

I am constantly getting a timeout error when attempting to read in tables from my sqlite database. This has happened in many instances in the recent past and its something that I need to address as I intend to use the db in tandem with a dashboard going forward. See below an instance of me getting the error:

empty_tables = []
for table in rs_tables:
    df = pd.read_sql_table(table,RSengine.connect())
    if df.empty:
        empty_tables.append(table)

Where RSEngine = `RSengine = sqlalchemy.create_engine('sqlite:///RS.db')

I have an image attached of the error i keep getting.

I tried splitting up the large list of tables into smaller chunks:

chunks = [tables[i:i + 10] for i in range(0, len(tables), 10)]

I also tried putting a sleep at the end of each iteration in the for loop, but I have a feeling this is more inefficient. Is there something I am missing? The first time i got this error It was during a larger calculation in a for loop as shown below:

initialErrors = {}
initialRS = {}
#iterate through the chunks:
for chunk in chunks:
    #print chunk
    #print(f'Processing chunk: {chunk}')
    for ticker in tqdm(chunk):
        try:
            stock = pd.read_sql_table(ticker, stockengine.connect(), index_col='Date') #think it happens here the timeout
            #take the last 2 years of data:
            stock1 = stock.iloc[-504:]
            **RSframe = relative_strength(df1=stock1,df2=INDEX1,windows=[21,63])**
            initialRS[ticker] = RSframe
            #write to RSengine:
            #RSframe.to_sql(ticker, RSengine, if_exists='replace',index=False)
        except Exception as e:
            initialErrors[ticker] = e
            #print(e)
            #break

The bolded line with the function relative_strength is the calculation and on average takes about 7seconds per table. The error occurs reading in each table.

I have also tried reading in each table as show in the stock = pd.read_sql_table(ticker, stockengine.connect(), index_col='Date') line and performing the calculations with the tables in memory but still managed to get a timeout error.

I tried the recommended solution as shown here but still ended up getting the error.

I should mention that the db contains over 1000 tables and should keep growing. Is there a more efficient way to read in this amount of tables without risking a timeout error? Is sqlite the way to go?

1 Answer 1

1

I think you probably need to cleanup the connections as you make them. Either with a context manager (below) or by calling conn.close() explicitly. Maybe after each read or after each chunk of reads.

empty_tables = []
for table in rs_tables:
    with RSengine.connect() as conn:
        df = pd.read_sql_table(table, conn)
        if df.empty:
            empty_tables.append(table)

Also seems like you could increase the performance here by limiting the data to the last 2 years with a query instead of using icol? Again you should cleanup the connections as you make them.

ticker_query = select(ticker_table).order_by(ticker_table.c.Date.desc()).limit(504)
stock = pd.read_sql_query(ticker_query, conn, index_col='Date')

Depending on which sqlalchemy version you are using you might be either piling up connections or transactions by iterating without closing the connection. You can turn on echo_pool="debug" to see what the connection pool is doing.

Sign up to request clarification or add additional context in comments.

2 Comments

What would be the downside of adding conn.close() after each iteration in the for loop? I tried it with a subset of 100 tables and i was able to read in every table without getting the error. Before i posted the question on checking at what iteration i got the timeout, it seemed to be around the low 30s....which i think is the default number of connections according to the [docs.sqlalchemy.org/en/20/errors.html#error-3o7r]
I guess if no one else is using the same file database then closing every iteration might be wasting time but I think you probably wouldn't notice if the whole script takes 2 hours anyways and if you are using the default pool the connection is just returned to the pool and not actually closed anyways. The important thing is to close each connection you open so it can be returned to the pool and not pile up connections.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.