0

I am very new to Pandas. How do I convert the following query into pandas syntax. I am no longer querying a MS Access table, I am now querying a pandas DataFrame called df.

The query is:

SELECT 
    Short_ID, 
    SUM(IIF(Status = 'Completed', 1, 0))) / COUNT (Status) AS completion_metric
FROM 
    PROMIS_LT_Long_ID
GROUP BY 
    Short_ID;

The query results would be something like this:

Short_ID | completion_metric
---------+------------------
1004     | 0.125
1005     | 0
1004     | 0.5

I have created the pandas df with the following code and now I would like to query the pandas DataFrame and obtain the same result as the query above.

import pyodbc
import pandas as pd 

def connect_to_db():
    db_name = "imuscigrp"
    conn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=tcp:SQLDCB301P.uhn.ca\SQLDCB301P;DATABASE=imucsigrp'
                             r';UID=imucsigrp_data_team;PWD=Kidney123!')
    cursor = conn.cursor()
    return cursor, conn

def completion_metric(): 
    SQL_Query = pd.read-sql_query('SELECT PROMIS_LT_Long_ID.Short_ID, PROMIS_LT_Long_ID.Status FROM PROMIS_LT_Long_ID', conn)
    #converts SQL_Query into Pandas dataframe 
    df = pd.DataFrame(SQL_Query, columns = ["Short_ID", "Status"])
    #querying the df to obtain longitudinal completion metric values 
    
    return 

Any contributions will help, thank you

1

1 Answer 1

1

You can use some numpy functions for performing similar operations.

For example, numpy.where to replace the value based on a condition.

import numpy as np

df = pd.DataFrame(SQL_Query, columns = ["Short_ID", "Status"])
df["completion_metric"] = np.where(df.Status == "Completed", 1,  0)

Then numpy.average to compute an average value for the grouped data.

completion_metric = df.groupby("Short_ID").agg({"completion_metric": np.average})
Sign up to request clarification or add additional context in comments.

3 Comments

What is the purpose of writing df["Completion"], you do not use it when you use Numpy average so how will python know that we want to count where Status == "Completed" ? PS: I am happy to see the Naija name, I'm from Zimbabwe. Good to see another African.
Should the second bock of code rather be ... .agg({"completion":np.average}) instead of ... .agg({"completion_metric":np.average}) ? If so, then this resolves the previous comment. I have another question though.... so this completion_metric is in the form of a pandas column right? Would it be possible to have the completion_metric column with the corresponding Short_ID because I ned to know which completion metric goes with which Short_ID.
That was my bad. I have updated the answer. df['completion_metric'] should be the averaged field after grouping by Short_ID. completion_metric is a pandas.DataFrame. You have the Short_ID and their average completion_metric value in the completion_metric dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.