0

Those who have helped me before, i tend to use SAS9.4 a lot for my day to day work, however there are times when i need to use SQL Server

There is a output table i have with 2 variables (attached output.csv) output table

ID, GROUP, DATE

The table has 830 rows: 330 have a "C" group 150 have a "A" group 50 have a "B" group

the remaining 300 have group as "TEMP"

within SQL i do not now how to programatically work out the total volume of A+B+C. The aim is to update "TEMP" column to ensure there is an Equal amount of "A" and "B" totalling 250 of each (the remainder of the total count)

so the table totals

330 have a "C" group 250 have a "A" group 250 have a "B" group

2 Answers 2

2

You want to proportion the "temp" to get equal amounts of "A" and "B".

So, the idea is to count up everything in A, B, and Temp and divide by 2. That is the final group size. Then you can use arithmetic to allocate the rows in Temp to the two groups:

select t.*,
       (case when seqnum + a_cnt <= final_group_size then 'A' else 'B' end) as allocated_group
from (select t.*, row_number() over (order by newid()) as seqnum
      from t
      where group = 'Temp'
     ) t cross join
     (select (cnt_a + cnt_b + cnt_temp) / 2 as final_group_size,
             g.*
      from (select sum(case when group = 'A' then 1 else 0 end) as cnt_a,
                   sum(case when group = 'B' then 1 else 0 end) as cnt_b,
                   sum(case when group = 'Temp' then 1 else 0 end) as cnt_temp
            from t
           ) g
     ) g

SQL Server makes it easy to put this into an update:

with toupdate as (
      select t.*,
             (case when seqnum + a_cnt <= final_group_size then 'A' else 'B' end) as allocated_group
      from (select t.*, row_number() over (order by newid()) as seqnum
            from t
            where group = 'Temp'
           ) t cross join
           (select (cnt_a + cnt_b + cnt_temp) / 2 as final_group_size,
                   g.*
            from (select sum(case when group = 'A' then 1 else 0 end) as cnt_a,
                         sum(case when group = 'B' then 1 else 0 end) as cnt_b,
                         sum(case when group = 'Temp' then 1 else 0 end) as cnt_temp
                  from t
                 ) g
           ) g
      )
update toupdate
    set group = allocated_group;
             
Sign up to request clarification or add additional context in comments.

7 Comments

Error in SQL statement: AnalysisException: Undefined function: 'newid'. This function is neither a registered
@teelove . . . Your question says that you are using SQL Server, and this is definitely a SQL Server function: learn.microsoft.com/en-us/sql/t-sql/functions/….
My bad - I am actually using SQL server, but just to test, i have tried it in databricks first whilst the server undergoes maintenance..
as soon as its back up and running i will double check
hello, server will be back up in 30 mins finally, just checking the syntax, what is select "t" and "g" refering to, are we creating temp tables here
|
1

I'd go with a top 250 update style approach

update top (250) [TableName] set Group = 'A' where exists (Select * from [TableName] t2 where t2.id = [TableName].id order by newid()) and Group = 'Temp'

update top (250) [TableName] set Group = 'B' where exists (Select * from [TableName] t2 where t2.id = [TableName].id order by newid()) and Group = 'Temp'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.