Converting SQL query into pandas syntax

Question

I am very new to Pandas. How do I convert the following query into pandas syntax. I am no longer querying a MS Access table, I am now querying a pandas DataFrame called df.

The query is:

SELECT 
    Short_ID, 
    SUM(IIF(Status = 'Completed', 1, 0))) / COUNT (Status) AS completion_metric
FROM 
    PROMIS_LT_Long_ID
GROUP BY 
    Short_ID;

The query results would be something like this:

Short_ID | completion_metric
---------+------------------
1004     | 0.125
1005     | 0
1004     | 0.5

I have created the pandas df with the following code and now I would like to query the pandas DataFrame and obtain the same result as the query above.

import pyodbc
import pandas as pd 

def connect_to_db():
    db_name = "imuscigrp"
    conn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=tcp:SQLDCB301P.uhn.ca\SQLDCB301P;DATABASE=imucsigrp'
                             r';UID=imucsigrp_data_team;PWD=Kidney123!')
    cursor = conn.cursor()
    return cursor, conn

def completion_metric(): 
    SQL_Query = pd.read-sql_query('SELECT PROMIS_LT_Long_ID.Short_ID, PROMIS_LT_Long_ID.Status FROM PROMIS_LT_Long_ID', conn)
    #converts SQL_Query into Pandas dataframe 
    df = pd.DataFrame(SQL_Query, columns = ["Short_ID", "Status"])
    #querying the df to obtain longitudinal completion metric values 
    
    return

Any contributions will help, thank you

related: stackoverflow.com/q/45865608/2144390

Gord Thompson
– Gord Thompson

2021-09-17 21:49:41 +00:00
Commented Sep 17, 2021 at 21:49 — Gord Thompson
– Gord Thompson, Commented Sep 17, 2021 at 21:49

Oluwafemi Sule · Accepted Answer · 2021-09-18 14:25:51Z

1

You can use some numpy functions for performing similar operations.

For example, numpy.where to replace the value based on a condition.

import numpy as np

df = pd.DataFrame(SQL_Query, columns = ["Short_ID", "Status"])
df["completion_metric"] = np.where(df.Status == "Completed", 1,  0)

Then numpy.average to compute an average value for the grouped data.

completion_metric = df.groupby("Short_ID").agg({"completion_metric": np.average})

edited Sep 18, 2021 at 14:25

answered Sep 17, 2021 at 20:29

Oluwafemi Sule

39.3k1 gold badge63 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ruva Over a year ago

What is the purpose of writing df["Completion"], you do not use it when you use Numpy average so how will python know that we want to count where Status == "Completed" ? PS: I am happy to see the Naija name, I'm from Zimbabwe. Good to see another African.

Ruva Over a year ago

Should the second bock of code rather be ... .agg({"completion":np.average}) instead of ... .agg({"completion_metric":np.average}) ? If so, then this resolves the previous comment. I have another question though.... so this completion_metric is in the form of a pandas column right? Would it be possible to have the completion_metric column with the corresponding Short_ID because I ned to know which completion metric goes with which Short_ID.

Oluwafemi Sule Over a year ago

That was my bad. I have updated the answer. df['completion_metric'] should be the averaged field after grouping by Short_ID. completion_metric is a pandas.DataFrame. You have the Short_ID and their average completion_metric value in the completion_metric dataframe.

Collectives™ on Stack Overflow

Converting SQL query into pandas syntax

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related