Thursday, August 21, 2008

Finding the number of observations in SAS dataset

There are a number of ways of finding out the number of observations in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter a pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname="WORK" and memname="TEST";
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx("nobs",nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option and you don’t need a SAS/AF licence to do it either.

%LET DSID=%SYSFUNC(OPEN(WORK.TEST,IN));
%LET NOBS=%SYSFUNC(ATTRN(&DSID,NOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

The first line opens the data set and the last one closes it; this is needed because you are not using data step or SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

source: technologytales.com

Another Simple way of doing this is to take help from PROC SQL automatic macro variable SQLOBS. Proc SQL automatically creates SALOBS macro variable, when it runs a program. SQLOBS macro variable will have the number of observations count of last proc SQL statement.

Here is how...

The following code will find the number of observations in the dataset AE in the Library name called SAS.

Note: Don't use noprint option in the Proc SQL statement.

libname sas 'C:\Users\Sarath Annapareddy\Desktop\*******;

proc sql;
select * from sas.ae;
quit;
%put SQLOBS=&sqlobs;



9 %put SQLOBS=&sqlobs;
SQLOBS=224



You can also get the number of observations value using Proc Contents. Here is how…

proc contents data=work.dataset out=nobs(where=(varnum=1)keep=memname nobs varnum)noprint;
run;

3 comments:

Anonymous said...

Thanks for posting this. I have been searching allover for a summary of different methods of finding the number of observations in a dataset! You'd think it would be easier to do.

Anonymous said...

Wonderful post, hightly useful one... Altogether in one place... Thanks for Sharing... :)

Anonymous said...

Thank you, you are the man, I am Brazilian and material we have here is very bad, it helped me very much.

Post a Comment

ShareThis