SQL data wrangling help using the Having statement

Question

The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type associated with it. Facility_type has 4 possible values (Hospital, ASC, Other, and null). In the table below, I want zip5 (10003) to be output, as it has only 1 of each of it's associated facility_types. Zip5 10016 would not be output, as it has 2 Hospital values. Zip5 10021 has 3 values with a Hospital facility_type, so it would also not be output. Zip5 10025 and 10029 would both be output.

I've tried using different having statements, but they all have allowed some unwanted zip5's to sneak into the output. For example, the following statement allows zip5 10016 into the output.

How can I achieve what I need to here? Is the having statement not the way to go?

HAVING (COUNT(DISTINCT ok.facility_type) <= 1 and count(distinct a.org_id) <=1)

SELECT a.zip5, a.org_id, ok.facility_type 
FROM sales_table a 
LEFT JOIN (SELECT ok.org_id, 
                  CASE WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
                       WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc 
                       ELSE 'Other' 
                       END AS facility_type
           FROM ref_table1 ok
           LEFT JOIN ref_table2 cot ON ok.ID = cot.ID) ok ON a.org_id = ok.org_id 
GROUP BY a.zip5, a.org_id, ok.facility_type

zip5	org_id	facility_type
10003	845307	Other
10003	001564	Hospital
10003	006054	null
10016	932258	Hospital
10016	005484	Hospital
10016	null	null
10016	584790	ASC
10021	005491	Hospital
10021	005154	Hospital
10021	002166	Hospital
10021	null	null
10025	001565	Hospital
10029	005425	Other
10029	005483	Hospital

player0 · Accepted Answer · 2024-06-06 22:31:37Z

cte calculates the count of each facility_type per zip5
then filters out zip5s where any facility_type count is greater than 1
next, selects records where zip5 is not in the filtered list, making sure that only zip5s with 1 or less of each facility_type are included (10003, 10025, and 10029 are included while 10016 and 10021 excluded)

-- Step 1: Calculate the counts of each facility_type per zip5
WITH facility_counts AS (
  SELECT 
    a.zip5, 
    ok.facility_type, 
    COUNT(a.org_id) AS facility_count
  FROM sales_table a 
  LEFT JOIN (
    SELECT 
      ok.org_id, 
      CASE 
        WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
        WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc 
        ELSE 'Other' 
      END AS facility_type
    FROM ref_table1 ok
    LEFT JOIN ref_table2 cot ON ok.ID = cot.ID
  ) ok ON a.org_id = ok.org_id 
  GROUP BY a.zip5, ok.facility_type
)

-- Step 2: Filter zip5s where any facility_type count is greater than 1
, filtered_zip5s AS (
  SELECT zip5
  FROM facility_counts
  WHERE facility_count > 1
  GROUP BY zip5
)

-- Step 3: Select zip5s that do not appear in the filtered_zip5s
SELECT a.zip5, a.org_id, ok.facility_type 
FROM sales_table a 
LEFT JOIN (
  SELECT 
    ok.org_id, 
    CASE 
      WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
      WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc 
      ELSE 'Other' 
    END AS facility_type
  FROM ref_table1 ok
  LEFT JOIN ref_table2 cot ON ok.ID = cot.ID
) ok ON a.org_id = ok.org_id 
WHERE a.zip5 NOT IN (SELECT zip5 FROM filtered_zip5s)
GROUP BY a.zip5, a.org_id, ok.facility_type
ORDER BY a.zip5, a.org_id;

Collectives™ on Stack Overflow

SQL data wrangling help using the Having statement

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related