0

I'm looking for a way to extract unique values of char variables in my datasets. I came across this paper https://pharmasug.org/proceedings/2015/IB/PharmaSUG-2015-IB07.pdf

This is an example from the article and it's exactly what I need.

a figure from the article

The problem is that I don't understand HOW the author got this output. The paper refernces vcolumn, vtable and proc freq, but using them doesn't give me that kind of output from my data.

Maybe long vacation affected my brain so I'm missing something obvious...

Many thanks in advance for any help.

1
  • Not much of a paper. A lot of "hand waving" and very little actual details. Commented Jul 20 at 15:40

1 Answer 1

3

I suspect they intended you to combine the metadata with the distinct values to generate that report.

Three of those columns are just the variables MEMNAME, NAME and LABEL you can get from PROC CONTENTS. Or as they mentioned from the metadata table DICTIONARY.COLUMNS (which you can reference outside of PROC SQL using the view SASHELP.VCOLUMN).

The other I assume is the actual distinct values of the named variable. They mentioned you could find that using PROC FREQ. But you could use other code such as PROC SUMMARY or just PROC SQL.

You could automate the process by using the list of variables to generate the code that finds the distinct values.

Here is one way to use the list of names to generate PROC SQL code to generate such a table. This example just uses a data step to write the generated code to a file that can then by run by using %INCLUDE. You could also use a data step with CALL EXECUTE(). Or even write a SAS macro to help you generate the code.

proc contents data=&dsname noprint out=contents; run;

filename code temp;
data _null_;
  set contents end=eof;
  where type=2;
  file code;
  if _n_=1 then put 'create table summary as ';
  else put 'union';
  put 'select distinct'
    / '       ' memname :$quote. 'as memname length=32'
    / '     , ' name :$quote. 'as name length=32'
    / '     , ' name 'as value'
    / '     , ' label :$quote. 'as label length=256'
    / 'from ' libname +(-1) '.' memname
  ;
  if eof then put ';' ;
run;

proc sql;
%include code / source2;
quit;

So if you run it for SASHELP.CLASS it will generate this SQL code:

227 +create table summary as
228 +select distinct
229 +       "CLASS" as memname length=32
230 +     , "Name" as name length=32
231 +     , Name as value
232 +     , "" as label length=256
233 +from SASHELP.CLASS
234 +union
235 +select distinct
236 +       "CLASS" as memname length=32
237 +     , "Sex" as name length=32
238 +     , Sex as value
239 +     , "" as label length=256
240 +from SASHELP.CLASS
241 +;

And generate this dataset

Obs    memname    name    value      label

  1     CLASS     Name    Alfred
  2     CLASS     Name    Alice
  3     CLASS     Name    Barbara
  4     CLASS     Name    Carol
  5     CLASS     Name    Henry
  6     CLASS     Name    James
  7     CLASS     Name    Jane
  8     CLASS     Name    Janet
  9     CLASS     Name    Jeffrey
 10     CLASS     Name    John
 11     CLASS     Name    Joyce
 12     CLASS     Name    Judy
 13     CLASS     Name    Louise
 14     CLASS     Name    Mary
 15     CLASS     Name    Philip
 16     CLASS     Name    Robert
 17     CLASS     Name    Ronald
 18     CLASS     Name    Thomas
 19     CLASS     Name    William
 20     CLASS     Sex     F
 21     CLASS     Sex     M
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.