0

Using a data step, I want to include a variable as array parameter.

Why? I want to declare an array, then fill it up with data (the needed length for the array is unknown/can change in the future). My current dataset looks like:

Row1: "val1=x val2=y val3=xx val4=yy" etc.
Row2: "val1=x"
Row3: "val1=x val2=y"

Now, I want to create columns, using an array for the valX read-outs. So a data step is used to count the number of occurrences of "val", which gave me a count column with the number of counts.

(1) Now I want to build an expression like: "array{count}", or "array{max(of count)}"; how can this be achieved? Since array needs an integer input?

(2) Another option would be to create an array{100} and then simply drop columns with only missing values, how can this be achieved?

Sample of the data step:

data count;
set input;
counter = count(column,'val','i');
run;
data output;
set count;
array Values{100};
do i = 1 to counter;
Values(i) = scan(column,i+1);
end;
run;
3
  • 1
    Please add some sample DATA step code showing what you are trying to do, with sample data. It's not clear to me. Note that you can do things like array myarray {*} val: ; . If you are creating variables, you may need to use one step to compute the number of variables to create, and store that in a macro variable, then in separate step use array myarray{&maxcount}... Commented Jun 6, 2018 at 10:42
  • data count; set input; counter = count(column,'val','i'); run; data output; set count; array Values{100}; do i = 1 to counter; Values(i) = scan(column,i+1); end; run; Commented Jun 6, 2018 at 10:58
  • The macro-variable use of &maxcount gave me the same result: array expected an integer, or a *. Commented Jun 6, 2018 at 11:04

2 Answers 2

2

After you create COUNTER, you can find the maximum value of counter and read that into a macro variable, then use the macro variable to define the size of your array.

data count;
  input column $50.;
  counter = count(column,'val','i');
  cards;
val1=x val2=y val3=xx val4=yy
val1=x
val1=x val2=y
;
run;

proc sql ;
  select max(counter) into :maxcount
  from count
  ;
quit ;

%put &=maxcount ;

data output;
  set count;
  array Values{&maxcount} $;
  do i = 1 to counter;
    Values(i) = scan(column,2*i,'= ');
  end;
  drop i ;
run;

SAS needs to know the size of the array when the data step compiles, which is why you can't use a data step variable to define the size of an array.

Sign up to request clarification or add additional context in comments.

Comments

0

Quentin has all the right ideas. However, there's some room for improvement:

  1. There's no need for the extra SQL step (and hence extra pass through the data) since the needed macro variables can be populated in the first DATA step.
  2. If any value exceeds 8 in length, the array element length specification $ will truncate it because it defaults to 8. The remedy is to compute the max array element length in the same step as the array dimension.

For example:

data count (keep = column) ;                             
  infile cards eof = eof ;                               
  input column $50. ;                                    
  retain dim len ;                                       
  do _n_ = 1 to countW (column) ;                        
    len = len max length (scan (column, 2 * _n_, "= ")) ;
  end;                                                   
  dim = dim max (_n_ - 1) ;                              
  return ;                                               
  eof: call symputx ("dim", dim) ;                       
       call symputx ("len", len) ;                       
       delete ;                                          
  cards;                                                 
val1=x val2=y val3=xx val4=yy12345678                    
val1=x                                                   
val1=x val2=y                                            
;                                                        
run ;                                                    

data output ;                                            
  set count ;                                            
  array values [&dim] $ &len ;                           
  do _n_ = 1 to countW (column) ;                        
    values[_n_] = scan (column, 2 * _n_, "= ") ;         
  end;                                                   
run ;

Best regards,

Paul Dorfman

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.