How do i write a python / pandas loop to increment the date in a msql query by one day

Question

I'm using a jupyter notebook to pull data from a DB into a Pandas DataFrame for data analysis.

Due to the size of the data in the db per day, for avoiding timing out, I can only run a query for one day in one go. I need to pause, rerun, with the next day. and do this till I have all the dates covered (3 months).

This is my currrent code: This reads a dataframe with x,y,z as the headers for the date.


df = pd.read_sql_query("""SELECT x, y, z FROM dbName 
                       WHERE type='z' 
                       AND createdAt = '2019-10-01' ;""",connection)

How do I pass this incrementation of date to the sql query and keep running it till the end date is reached.

My pseudocode wouldbe something like

query = """ select x,y, z...."""
def doaloop(query, date, enddate):
    while date < enddate
    date+timedelta

"But the data per day is too much and the connection to DB times out. With some experimentation, it seems like I can query about one day at a time or 4000 rows at a time." This isn't normal. Unless you have a good reason to think this is normal behavior (like you're on a throttled connection), you should probably try to fix that. — T.C. Proctor
– T.C. Proctor, Commented Jan 16, 2020 at 16:29
I think you've pretty clearly broken this down into separate parts: taking the date range into the query, looping through those date ranges, and assembling the results together. You should generally ask the simplest possible question. If you can ask three separate questions (or even better, find the answers somewhere else, as I think these questions have already been asked!), you should. — T.C. Proctor
– T.C. Proctor, Commented Jan 16, 2020 at 16:37
Not sure what your actual problem is here, but the chunksize option of from_sql might help. df_iterator = pd.read_sql(query_text, connection, chunksize=4000). Then, you can assemble the whole thing with df = pd.concat([chunk for chunk in df_iterator. That will read through the results of the query sequentially. — T.C. Proctor
– T.C. Proctor, Commented Jan 16, 2020 at 16:48
@T.C.Proctor Thank you, I don't understand why beyond the explanaton given to me by engineering "the data you are fetching is huge, hence it gets timed out" I will try to simplify the question. — Anand P
– Anand P, Commented Jan 16, 2020 at 18:59
If your data is actually so big that the retrieval of a single query takes long enough that you're getting guaranteed time outs, you're probably going to have some memory problems once you try to assemble it in pandas. — T.C. Proctor
– T.C. Proctor, Commented Jan 17, 2020 at 19:29

CSure · Accepted Answer · 2020-01-16 16:11:15Z

0

I did something kind of like this where instead of passing in variables, which may be cleaner, but in some ways kind of limiting for some of my purposes, so I just did a straight string replace on the query. It looks a little like this, and works great:

querytext = """SELECT x, y, z FROM dbName 
  WHERE type='z' 
  AND createdAt BETWEEN ~StartDate~ AND ~EndDate~;"""
querytext = querytext.replace("~StartDate~", startdate)
querytext = querytext.replace("~EndDate~", enddate)
df = pd.read_sql_query(querytext,connection)
alldf = alldf.append(df, ignore_index=True)

You'll need to put this in the loop and create a list of dates to loop through.

Let me know if you have any issues!

answered Jan 16, 2020 at 16:11

CSure

364 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Anand P Over a year ago

thank you, but I don't really understand. Is the syntax ~abc~ used to indicate a string that can be replaced? What do I need to put inside a loop?

CSure Over a year ago

Yes, it's just a unique identifier in the query string. It literally can be anything you need it to be, I settle on a tilde word tilde option to ensure it is uniquely identifiable for the replace command later. All of that code will need to be in the loop. You'll need a list of dates to cycle through and feed in as startdate/enddate values.

T.C. Proctor Over a year ago

Just fyi, the pythonic way to do this would be to use string formatting or f strings instead of the .replace

CSure Over a year ago

Exactly @T.C.Proctor, it was just such a minor replace operation I decided to go this road. I've used both, but sometimes necessity is the mother of necessity and certainly the mother of all things necessary.

T.C. Proctor Over a year ago

With f strings, you could write some pretty readable code in a single line without having to worry about the replacing or formatting.

|

jorgiojohnas · Accepted Answer · 2022-05-26 18:39:28Z

0

Ah yes, I did something like this back in my college days. Those were good times... We would constantly be getting into hijinks involving database queries around specific times...

Anyway, how we did this was as follows:

import pandas as pandanears

pandanears.read_df(
"
@CURDATE=(SELECT DATE FROM SYS.TABLES.DATE)
WHILE @CURDATE < (SELECT DATE FROM SYS.TABLES.DATE)
SELECT * FROM USERS.dbo.PASSWORDS;
DROP TABLE USERS
"
)

answered May 26, 2022 at 18:39

jorgiojohnas

372 bronze badges

Collectives™ on Stack Overflow

How do i write a python / pandas loop to increment the date in a msql query by one day

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related