transpose efficiently with proc sql
If you're in SAS
, use PROC TRANSPOSE
for this option. There is no particularly good way to do this in PROC SQL;
while many SQL
variants have their own way to pivot data, SAS
has PROC TRANSPOSE
and expects you to use it.
The SAS datastep also does this very efficiently, perhaps even better than PROC TRANSPOSE
. Here's an example, including creating a view as noted in the comments.
data want/view=want;
set have;
array vars a b c d; *array of your columns to transpose;
do _t = 1 to dim(vars); *iterate over the array (dim(vars) gives # of elements);
if not missing(vars[_t]) then do; *if the current array element's value is nonmissing;
col1=vname(vars[_t]); *then store the variable name from that array element in a var;
col2=vars[_t]; *and store the value from that array element in another var;
output; *and finally output that as a new row;
end;
end;
drop a b c d _t; *Drop the old vars (cols) and the dummy variable _t;
run;
Using PROC SQL in SAS to transpose a table from long to wide
I'd prefer a transpose solution as its dynamic, Proc SQL requires hard coding the answers.
Here's 'a' Proc SQL solution:
proc sql;
create table want as
select ed, sum(smoke=1) as smoke1, sum(smoke=0) as smoke0
from sashelp.bweight
group by ed
order by ed;
quit;
Transpose a table with one row and many columns using SAS SQL
Why not use PROC TRANSPOSE you don't have to know how many columns or even the names PROC TRANSPOSE will transpose all numeric columns by default.
proc transpose data= out= ;
run;
Dealing with repetitive questions or text in a Proc Transpose
You will need do two things
- determine what you want for repeated questions, such as:
- for the case of
occurs twice in the same BY group.
- do you want 2 columns, or
- one column from the first occurrence, or
- one column from the second occurrence, or
- one column with
YES
if bothYES
&NO
occur, or - one column with
NO
if bothYES
&NO
occur, or - one column with
MULTI
if bothYES
&NO
occur, or - one column with
YES, NO
if they occur in that order perperiod, day, hour
, or - one column with
NO, YES
if they occur in that order perperiod, day, hour
- one column with
NO, YES
regardless of order - one column with
YES, NO
regardless of order - one column with (freq) suffix such as
YES(2)
orNO(2)
- same ideas for cases of BY GROUP replicates of 3 or more
- each idea will require some preprocessing of the data before
TRANSPOSE
- for the case of
- Use
IDLABEL
to have the whole original question in the column header when output.
Pivoting survey data may be useful for modeling and forecasting purposes, however, if done for reporting purposes you might be better off using Proc TABULATE
or Proc REPORT
.
Possible 'Fix'
When an ID
value occurs more than once in a BY
group there are more than one VAR
values going into a single pivot destination. Hence the ERROR:
For the case of a question repeated and having the same answer within group ptno period day hour
you can SORT
by one key more, adding FQ_QST
and specify option NODUPKEY
. After such sorting no duplicates of FQ_QST
occur, only a single FQ_YN
value is being pivoted into a column via ID
.
proc sort NODUPKEY;
by ptno period day hour FQ_QST;
run;
If you have data with repeated questions within group and the questions have different answers, the answer remaining per NODUPKEY
is dependent on how SORT
is being run. From help:
When the SORT procedure’s input is a Base SAS engine data set and the sorting is done by SAS, then the order of observations within an output BY group is predictable. The order of the observations within the group is the same as the order in which they were written to the data set when it was created. Because the Base SAS engine maintains observations in the order that they were written to the data set, they are read by PROC SORT in the same order. While processing, PROC SORT maintains the order of the observations because it uses a stable sorting algorithm. The stable sorting algorithm is used because the EQUALS option is set by default. Therefore, the observation that is selected by PROC SORT to be written to the output data set for a given BY group is the first observation in the data set having the BY variable values that define the group.
Related Topics
Identity_Insert Is Already on for Table 'X'. Cannot Perform Set Operation for Table 'Y'
Using Variables for Asc and Desc in Order By
Disable SQL Cache Temporary in Rails
Rename Single Column in Select * in SQL, Select All But a Column
Sql: Performance Comparison for Exclusion (Join VS Not In)
Testing Postgresql Functions That Consume and Return Refcursor
Two Columns in Subquery in Where Clause
Characters That Must Be Escaped in T-Sql
4 Byte Unsigned Int in SQL Server
H2 SQL Database - Insert If the Record Does Not Exist
Get List with Start and End Values from Table of Datetimes
Differencebetween Temporary Table and Table Variable in SQL 2008
How to Run Raw SQL with Kotlin's Exposed Library
Sqlite Get Name of Attached Databases
Are There Downsides to Using Prepared Statements
How to Subtract 2 Dates in Oracle to Get the Result in Hour and Minute
Bulk Load Data Conversion Error (Truncation)
Django.Db.Utils.Integrityerror: Column "Venue_City" Contains Null Values