Posts

Showing posts with the label Proc Sort

Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions

Today, I stumbled upon a post where the author talks about a new options that are available in SAS 9.3 and later versions. These options ( NOUNIQUEKEYS and  UNIQUEOUT)    that allows sorting and then finding the duplicate records to be done in one step using PROC SORT. Direct Link:  Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions Christopher J. Bost published a paper in SAS Global Forum 2013 regarding the same option. Dealing with Duplicates

How can I count number of observations per subject in a data set?

Image
We always have this question in mind, while we do the SAS programming and here is the simple answer for that, we just need to use SUM statement and the FIRST .variable in the SET statement and then the RETAIN statement to calculate the observations count per subject. By doing some minor modification we can calculate observations count per subject per visit also. (Just include visit variable in the BY variable list in PROC sort and First . variable list in datastep with SET statement). For example: data dsn ; input patid implants ; datalines; 1 3 1 1 1 2 1 1 2 1 2 2 3 1 4 2 3 1 4 5 2 3 1 6 ; run ; proc sort data=dsn ; by patid ; run ; data dsn1 ; set dsn ; by patid ; cnt+1 ; if first.patid then cnt=1 ; run ; proc sort data= dsn1 ; by patid descending cnt ; run ; data dsn2 ; set dsn1 ; by patid ; retain totcnt ; if first.patid then totcnt=cnt ; output; run; proc print data=dsn2 ; run; Output:

Mastering Duplicates Removal in SAS: A Comprehensive Guide to Using PROC SQL, DATA STEP, and PROC SORT

Removing Duplicate Observations in SAS: A Comprehensive Guide Removing Duplicate Observations in SAS: A Comprehensive Guide In data analysis, it's common to encounter datasets with duplicate records that need to be cleaned up. SAS offers several methods to remove these duplicates, each with its strengths and suitable scenarios. This article explores three primary methods for removing duplicate observations: using PROC SQL , the DATA STEP , and PROC SORT . We will provide detailed examples and discuss when to use each method. Understanding Duplicate Observations Before diving into the methods, let's clarify what we mean by duplicate observations. Duplicates can occur in different forms: Exact Duplicates: All variables across two or more observations have identical values. Key-Based Duplicates: Observations are considered duplicates based on the values of specific key variables (e.g., ID, Date). The ...