0

I have a python script that is reaching out to an Alteryx MongoDB and pulling in the Queue table. Works well but it's a fairly dumb collection.find() that returns everything. The size isn't insurmountable now but will continue to grow. I only truly need the current day's results. Unfortunately, Alteryx MongoDB instance uses strings for all date fields. I've worked out a script using Studio 3T that pulls the data but I am stuck on how to convert this to a python query so I can automate this in Lambda.

Python Function:

def authenticate():  
    client = MongoClient('mongodb://user:*******')
    db = client.AlteryxService
    collection = db.AS_Queue
    data = pd.DataFrame(list(collection.find()))
   
    #Trim columns not needed 
    data = data.drop(["IsAnonymous", "RestrictOutput","AS_Schedules__ID","__ServiceData", "__Version","__UpdateCheck"], axis=1 )
    #Swap Object to string type
    data = data.astype('string')

    #Change time to D/T
    data["CompletionDateTime"] = pd.to_datetime(data["CompletionDateTime"])
    data["CreationDateTime"] = pd.to_datetime(data["CreationDateTime"])

    #Grab only todays data
    today = datetime.datetime.now().strftime('%Y/%m/%d')
    data = data[data['CreationDateTime'] >= today]
    #normalize to postgres output
    data.columns = map(str.lower, data.columns)

    return data

This part of the function can easily be trimmed down if I only grab the current day.

In the Studio3T script, I am using this:

today = new Date();
  today.setHours(0, 0, 0, 0);
startOfDay = today.toISOString()
 .slice(0, -5)
 .replace(/T/g, match => match === 'T' ? ' ' : '');
end = new Date();
end.setHours(23, 59, 59, 999);
endOfDay = end.toISOString()
 .slice(0, -5)
 .replace(/T/g, match => match === 'T' ? ' ' : '');
db.getCollection("AS_Queue").find( 
   { 
     "CompletionDateTime": {
       $gt: startOfDay,
       $lt: endOfDay
}},{"IsAnonymous":0,"RestrictOutput":0,"AS_Schedules__ID":0,"__ServiceData":0,"__Version":0,"__UpdateCheck":0})

This returns both the columns I need and the correct date. How do I convert this into the python query? When I just trim the size by adding the columns:0 it returns zero results.

2
  • 1
    So what is the exact format of "CompletionDateTime" in the collection document? Commented Sep 12, 2022 at 20:50
  • "CompletionDateTime" : "2022-09-12 04:00:52" Commented Sep 13, 2022 at 13:52

1 Answer 1

1

To construct the eqivalant filter querying a date string in YYYY-MM-DD format, use:

from datetime import date, timedelta
start_of_day = date.today().strftime("%Y-%m-%d")
end_of_day = (date.today() + timedelta(days=1)).strftime("%Y-%m-%d")
doc_filter = {
    "CompletionDateTime": {
        '$gte': start_of_day,
        '$lt': end_of_day
    }
} 
Sign up to request clarification or add additional context in comments.

1 Comment

Worked extremely well. Thanks for the assist! I added the column select logic directly after. data = pd.DataFrame(list(collection.find(doc_filter,{"IsAnonymous":0,"RestrictOutput":0,"AS_Schedules__ID":0,"__ServiceData":0,"__Version":0,"__UpdateCheck":0})))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.