2

I have a table containing a accrual_date, absence_type, employee_id and duration_days.

accrual_date        absence_type  duration_days  employee_id
01JAN2001:00:00:00  010           10.20          1
01JAN2001:00:00:00  014           11             1
01JAN2002:00:00:00  015           30             2
01JAN2001:00:00:00  015           20             2

I would like to create a query that sum the duration_days per emplid per absence type. So the result should be like:

employee_id       duration_days_010   duration_days_014  duration_days_015
1                 10.20               11                 .
2                 .                   .                  50

Add a column containing the duration_daysper absence_type per employee_id:

proc sql;
create table sort_second as
select 
        case when absence_type='014' then sum(duration_days) else . end as duration_days_014,
        case when absence_type='015' then sum(duration_days) else . end as duration_days_015,
        case when absence_type='010' then sum(duration_days) else . end as duration_days_010,
        employee_id, absence_type
    from sort_first
    group by emplid;

quit;

Then remove the duplicate keys:

proc sort data=sort_second out=test1 nodupkey;
by emplid;
quit;

But what this code does is disregards that it's from 014 or 015 or 010 and add it all for an employee. Like this:

employee_id       duration_days_010   duration_days_014  duration_days_015
    1                 21.20               21.20          .
    2                 .                   .                  50

Kindly advise what went wrong. Thank you in advance.

1 Answer 1

3

First off, if you're in SAS I would recommend using SAS tools!

In this case, either a PROC FREQ or better PROC TABULATE would do this directly, and if you want a dataset you can get that with ODS OUTPUT.

ods output table=want;
proc tabulate data=have;
where absence_type in (10,14,15);
class absence_type employee_id;
var duration_days;
tables employee_id,absence_type*duration_days*sum;
run;

proc transpose data=want out=final prefix=duration_days_;
by employee_id;
id absence_type;
var duration_days_sum;
run;

If you want to stick with SQL, what you need to do is change how the case statements work.

case when absence_type='014' then sum(duration_days) else . end as duration_days_014,

should be

sum(case when absence_type='014' then duration_days else . end) as duration_days_014,

IE, you want to sum an imaginary column that just has the 014 duration days in it. What you're doing in your example is inserting the sum of all duration_days in any column that the employee has any duration_days in. You also should be able to skip most of the steps above - you can do this from the initial dataset.

proc sql;
create table final as
select 
        sum (case when absence_type=014 then duration_days else . end) as duration_days_014,
        sum (case when absence_type=015 then duration_days else . end) as duration_days_015,
        sum (case when absence_type=010 then duration_days else . end) as duration_days_010,
        employee_id
    from have
    group by employee_id;

quit;
Sign up to request clarification or add additional context in comments.

2 Comments

In PROC SQL you can also use a boolean (1 or 0) to conditionally sum, like sum(duration_days *(absence_type=14)) as duration_14. The 'absence_type = 14' bit will resolve to 1 when true and 0 when false. About the same, I just find it more readable than SQLs awful 'CASE'.
That certainly does work; I tend to prefer case, despite its awfulness, because it's more clear to other readers of the code exactly what it's doing. Equal signs can be easily missed, unfortunately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.