The SORT Procedure (Proc Sort): Options
We can do many things using PROC SORT like
create a new data set, subset the data, rename/ drop/ keep variables, format, or label your variables etc
Options Available with Proc Sort:
- OUT= OPTION
- DESCENDING OPTION
- DROP=, KEEP=, AND RENAME= OPTIONS
- FORMAT AND LABEL STATEMENTS
- WHERE= OPTION OR WHERE STATEMENT
- FIRSTOBS= AND OBS= OPTIONS
- NODUPRECS AND NODUPKEY OPTIONS
A common interview question for SAS jobs is "What is the difference between proc sort nodup and proc sort nodupkey?". The answer the interviewer is expecting is usually "proc sort nodup gets rid of duplicate records with the same sort key but proc sort nodupkey gets rid of other records with the same sort key". However, this is not correct.
Common Programming Mistake with Proc Sort NODUPRECS -
Equivalent of NODUPKEY in PROC SQL
Ian Whitlock Explains...
NODUPKEY is like FIRST. processing. Both depend on order which is an alien
concept to SQL. SQL depends on information stored as data, not in variable
names and not in order.
If for example you had the variables GROUP and SEQ where SEQ is a sequence
number within group, then you could use a GROUP BY GROUP clause with HAVING
SEQ=1. In short there must be something in the data values that indicates
which records you want.
The suggestion to use DISTINCT works when all records having the same value
of GROUP have all other relevant variables with equal values. This would be
equivalent to the NODUP option in PROC SORT.