ICON Clinical Research
PROC SQL: A Brief Overview
Jasmin Fredette
PROC SQL is SAS' implementation of Structured Query Language
Retrieve data from and manipulate SAS datasets
Add or modify data values in a dataset
Add, modify, or drop columns in a dataset
Create datasets and views
Join multiple datasets (whether or not they contain columns with the same name)
Generate reports
Introduction
PROC SQL: A Brief Overview
Syntax
PROC SQL options;
CREATE TABLE TableName AS
SELECT column-1<, . . . column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<WHERE expression>
<GROUP BY column-1<, ... column-n>>
<HAVING expression>
<ORDER BY column-1<, ... column-n>>;
QUIT;
Introduction
The clauses in a PROC SQL SELECT statement must be specified in the
order shown.
PROC SQL: A Brief Overview
Useful Proc SQL Options
PROC SQL FEEDBACK; Gives a detailed description in the SAS log Debugging
tool.
PROC SQL OUTOBS=n; Restricts the rows that are displayed.
PROC SQL INOBS=n; Restricts the rows that are read - Debugging queries on large
tables.
Use the PROMPT | NOPROMPT option with the INOBS= and OUTOBS= options
so that you are prompted to stop or continue processing when the limits set by
these options are reached.
PROC SQL NOEXEC; Does not execute the query, but checks if it is valid and all
referenced columns and tables exist.
PROC SQL NUMBER; Includes a column named ROW, which displays row numbers.
PROC SQL STIMER; writes timing information for each statement to the SAS log,
instead of writing a cumulative value for the entire procedure.
PROC SQL: A Brief Overview
AVG, MEAN mean or average of values
COUNT, FREQ, N number of nonmissing values
CSS corrected sum of squares
CV coefficient of variation (percent)
MAX largest value
MIN smallest value
NMISS number of missing values
PRT probability of a greater absolute value of Student's t
RANGE range of values
STD standard deviation
STDERR standard error of the mean
SUM sum of values
T student's t value for testing the hypothesis that the population mean is zero
USS uncorrected sum of squares
VAR variance
Summary Functions
PROC SQL: A Brief Overview
Summary Functions
One column as
argument
Performed down the
column
proc sql;
select sum(var1)as Total1
from adsl;
Multiple columns as
arguments
Performed across
columns for each
row
proc sql;
select sum(var1,var2,var3) as Total2
from adsl;
Example using the distinct keyword :
proc sql;
select count(distinct usubjid) as Total3
from adsl;
quit;
PROC SQL: A Brief Overview
Group by statement used with summary functions
GROUP BY: classifies the data into groups based on the specified column(s)
If you specify a GROUP BY clause in a query that does not contain a
summary function, your clause is changed to an ORDER BY clause.
If a query contains a GROUP BY clause, all columns in the SELECT clause
that do not contain a summary function should be listed in the GROUP BY
clause or unexpected results might be returned.
PROC SQL: A Brief Overview
BETWEEN
AND
values that occur within an inclusive range
where salary between 70000
and 80000
CONTAINS
or ?
values that contain a specified string
where name contains 'ER'where name ? 'ER'
IN values that match one of a list of values
where code in ('PT','NA','FA')
IS MISSING
or IS NULL
missing values
where dateofbirth is missingwhere dateofbirth is
null
LIKE
(with %, _)
values that match a specified pattern
where address like '% P%PLACE'
=* values that "sound like" a specified value
where lastname=* 'Smith'
ANY values that meet a specified condition with respect to
any one of the values returned by a subquery
where dateofbirth < any (select dateofbirth from
sasuser.payrollmaster where jobcode='FA3')
ALL values that meet a specified condition with respect to
all the values returned by a subquery
where dateofbirth < all (select dateofbirth from
sasuser.payrollmaster where jobcode='FA3')
EXISTS the existence of values returned by a subquery
where exists (select * from sasuser.flightschedule
where fa.empid= flightschedule.empid)
Conditional Operators
PROC SQL: A Brief Overview
Joining Tables with PROC SQL
Advantages:
Do not require sorted or indexed tables.
Do not require that the columns in join expressions have the same name
Joins can use comparison operators other than the equal sign (=).
Based on a mainframe benchmark in batch mode, the SQL queries
use less CPU time, but more I/O operations, than this non-SQL
program.
DATA step match-merges and PROC SQL joins can produce the same results,
although there are important differences between these two techniques used to
combine tables vertically.
PROC SQL: A Brief Overview
An inner join returns a result set for all of the rows in a table that have
one or more matching rows in another table , based on the matching
criteria (also known as join conditions) that are specified in the ON clause.
Joining Tables - Inner Join
PROC SQL: A Brief Overview
1 C
2 D
3 E
4 F
5 G
2 C 1
2 D 2
3 D 3
4 F 4
6 G 5
W Y
A B A B X
proc sql;
create table Z as
select *
from W inner join Y
on W.A=Y.A;
quit;
Z
A B X
WARNING: Variable A already exists on file WORK.Z.
WARNING: Variable B already exists on file WORK.Z.
2 D 1
2 D 2
3 E 3
4 F 4
An outer join returns rows that match across tables (as in the inner join) plus
nonmatching rows from one or more tables.
Be sure to specify columns that have matching data values in the WHERE
clause in order to avoid unwanted combinations
Joining Tables Outer Join
PROC SQL: A Brief Overview
Joining Tables Outer Join
SELECT column-1<,...column-n>
FROM table-1 | view-1
LEFT JOIN | RIGHT JOIN | FULL JOIN
table-2 | view-2
ON join-condition(s)
<other clauses>;
To further subset the rows in the query output, you can follow the ON clause with a
WHERE clause. The WHERE clause subsets the individual detail rows before the
outer join is performed. The ON clause then specifies how the remaining rows are
to be selected for output.
To overlay common columns, you must add the COALESCE function to the
SELECT clause.
PROC SQL: A Brief Overview
Right join
Joining Tables Outer Join
PROC SQL: A Brief Overview
Joining Tables Outer Join
Full join
PROC SQL: A Brief Overview
Joining Tables Outer Join
PROC SQL: A Brief Overview
1 C
2 D
3 E
4 F
5 G
2 C 1
2 D 2
3 D 3
4 F 4
6 G 5
W Y
A B A B X
proc sql;
create table Z as
select COALESCE(W.a, Y.a) as AA ,COALESCE(W.b, Y.b) as BB, x
from W full join Y
on W.a=Y.a;
quit;
Z
AA BB X
Full join
1 C .
2 D 2
2 D 1
3 E 3
4 F 4
5 G .
6 G 5
Combine the Tables Vertically
Set Operators: EXCEPT, INTERSECT, UNION, and OUTER UNION
Optional Keywords: ALL and CORR (CORRESPONDING),
Optional Query Clauses: WHERE, GROUP BY, HAVING, and ORDER BY
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>
set-operator <ALL> <CORR>
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>;
Set Operations
PROC SQL: A Brief Overview
Set Operators
PROC SQL: A Brief Overview
Set Operators
PROC SQL: A Brief Overview
Optional Keywords
PROC SQL: A Brief Overview
Examples
PROC SQL: A Brief Overview
Examples
PROC SQL: A Brief Overview
Examples
PROC SQL: A Brief Overview
Examples
In order to be considered a common row and to be included in the output, every
duplicate row in one table must have a separate duplicate row in the other table. In this
example, there are no rows that have duplicate values and that are also common across
tables. Therefore, in this example, the set operation with the keywords ALL and CORR
generates the same output as with the keyword CORR alone.
PROC SQL: A Brief Overview
Creating Macro Variables
The INTO Keyword to Create Macro variables
proc sql noprint;
select count(subjid), avg(weightbl)
into : N_MALE, : WGT_MALE
from sasdata.ADSL
where SEXN=1;
quit;
%put Number of Males: &N_MALE;
%put Average Weight at Baseline for Males: &WGT_MALE;
The INTO clause does not trim leading or trailing blanks, but the %LET
statement does.
PROC SQL: A Brief Overview
PROC SQL: A Brief Overview
Conclusion
Try it
Use it
Love it!