Transpose Efficiently with Proc SQL

transpose efficiently with proc sql

If you're in SAS, use PROC TRANSPOSE for this option. There is no particularly good way to do this in PROC SQL; while many SQL variants have their own way to pivot data, SAS has PROC TRANSPOSE and expects you to use it.

The SAS datastep also does this very efficiently, perhaps even better than PROC TRANSPOSE. Here's an example, including creating a view as noted in the comments.

data want/view=want;
set have;
array vars a b c d; *array of your columns to transpose;
do _t = 1 to dim(vars); *iterate over the array (dim(vars) gives # of elements);
if not missing(vars[_t]) then do; *if the current array element's value is nonmissing;
col1=vname(vars[_t]); *then store the variable name from that array element in a var;
col2=vars[_t]; *and store the value from that array element in another var;
output; *and finally output that as a new row;
end;
end;
drop a b c d _t; *Drop the old vars (cols) and the dummy variable _t;
run;

Using PROC SQL in SAS to transpose a table from long to wide

I'd prefer a transpose solution as its dynamic, Proc SQL requires hard coding the answers.

Here's 'a' Proc SQL solution:

proc sql;
create table want as
select ed, sum(smoke=1) as smoke1, sum(smoke=0) as smoke0
from sashelp.bweight
group by ed
order by ed;
quit;

Transpose a table with one row and many columns using SAS SQL

Why not use PROC TRANSPOSE you don't have to know how many columns or even the names PROC TRANSPOSE will transpose all numeric columns by default.

proc transpose data= out= ;
run;

Dealing with repetitive questions or text in a Proc Transpose

You will need do two things

  • determine what you want for repeated questions, such as:
    • for the case of occurs twice in the same BY group.
    • do you want 2 columns, or
    • one column from the first occurrence, or
    • one column from the second occurrence, or
    • one column with YES if both YES & NO occur, or
    • one column with NO if both YES & NO occur, or
    • one column with MULTI if both YES & NO occur, or
    • one column with YES, NO if they occur in that order per period, day, hour, or
    • one column with NO, YES if they occur in that order per period, day, hour
    • one column with NO, YES regardless of order
    • one column with YES, NO regardless of order
    • one column with (freq) suffix such as YES(2) or NO(2)
    • same ideas for cases of BY GROUP replicates of 3 or more
    • each idea will require some preprocessing of the data before TRANSPOSE
  • Use IDLABEL to have the whole original question in the column header when output.

Pivoting survey data may be useful for modeling and forecasting purposes, however, if done for reporting purposes you might be better off using Proc TABULATE or Proc REPORT.

Possible 'Fix'

When an ID value occurs more than once in a BY group there are more than one VAR values going into a single pivot destination. Hence the ERROR:

For the case of a question repeated and having the same answer within group ptno period day hour you can SORT by one key more, adding FQ_QST and specify option NODUPKEY. After such sorting no duplicates of FQ_QST occur, only a single FQ_YN value is being pivoted into a column via ID.

proc sort NODUPKEY; 
by ptno period day hour FQ_QST;
run;

If you have data with repeated questions within group and the questions have different answers, the answer remaining per NODUPKEY is dependent on how SORT is being run. From help:

When the SORT procedure’s input is a Base SAS engine data set and the sorting is done by SAS, then the order of observations within an output BY group is predictable. The order of the observations within the group is the same as the order in which they were written to the data set when it was created. Because the Base SAS engine maintains observations in the order that they were written to the data set, they are read by PROC SORT in the same order. While processing, PROC SORT maintains the order of the observations because it uses a stable sorting algorithm. The stable sorting algorithm is used because the EQUALS option is set by default. Therefore, the observation that is selected by PROC SORT to be written to the output data set for a given BY group is the first observation in the data set having the BY variable values that define the group.



Related Topics



Leave a reply



Submit