Discover More Tips and Techniques on This Blog

Showing posts with label NOBS. Show all posts
Showing posts with label NOBS. Show all posts

Proc Compare/Dictionary.Coulmns/Dictionary.Tables.: Program used to compare the SAS datasets in two directories

Here is the new Proc compare.sas program, I have developed ....to compare all the datasets in 2 directories(testing and production) at once and to quick check any mismatches.

Proc compare only check if there is any mismatches between the datasets in 2 directories. If any, it reports otherwise it will give us a note saying that:

Note: No unequal Values were found. All values compared are exactly equal.

See the proc compare snap shot:


What if any dataset has the length more than 8, and what if any variable length more than 40 and what if the dataset name has more than 8 characters etc... Proc Compare doesn't address this issue.

I have developed the following program to address this issue. It’s a mandatory that we need to follow certain requirements when we are preparing for an electronic submission to the FDA.

The following are some of the QC checks FDA requirements:
1) The length of a dataset name & variable name shouldn’t be more than 8 characters.
2) The length data set label and a variable label shouldn’t be more than 40 characters.

This following program will give the SAS programmer a basic idea of how to check the dataset and variable attributes using the metadata (dictionary.columns and dictionary.tables) using Proc SQL. This program will save us some critical time.


Here are the details this program will give us…
1) Compare the variable attributes and prints the differences (length, format and informats) between production and testing directories .
2) Compare the labels, no. of observations and no. of variables in the datasets and prints if there is any differences between testing and production directories.
3) Checks the data set label and its length and prints if any dataset name GT 8 and dataset label GT 40.
4) Checks the variable label and their lengths of a dataset and prints if any dataset name GT 8 and dataset label GT 40.
5) Checks length of (character)variables and prints them if any variable has GT 200 in length;

****************************************************************;
*** Program: proccompare.sas ***;
*** Version: 1.0 ***;
*** Client: ABC Pharmaceuticals, Inc. ***;
*** Protocol: ABC-2009 ***;
*** Programmer: Sarath Annapareddy ***;
*** Date: Mar 31st 2009 ***;
*** Purpose: Program used to compare the attributes *** lengths,labels,formats and ***;
*** informats) of datasets in production and testing libraries***;
*** Program also used to check the length of variables in each***; *** dataset. ***;
*****************************************************************;

libname test 'H:\company\client\Testing\#####\###########\### datasets';
libname prodn 'H:\company\client\Testing\#####\###########\ ### datasets';

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _test as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("TEST",libname)
order by memname, name;
create table _test1 as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("TEST",libname));
quit;

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _prodn as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("PRODN",libname)
order by memname, name;
create table _prodnl as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("PRODN",libname));
quit;

*Run proc compare to check variable attributes in prodn and test directories;
ods listing close;
ods rtf style=style.rtf file="Compare_vars_Out.rtf";
proc compare data=_prodn compare=_test;
id memname name label;
run;
ods rtf close;
ods listing;

*Run proc compare to check labels, no. of obs and no. of variables of the datasets;
ods listing close;
ods rtf style=style.rtf file="Compare_dataset_Out.rtf";
proc compare data=_prodnl(drop=libname) compare=_test1(drop=libname);
run;

ods rtf close;
ods listing;

*Check analysis data set name, label and their lengths;
ods listing close;
ods rtf style=style.rtf file="variable_length_check.rtf";
proc sql noprint;
create table v_length as
select memname label='Dataset Name', length(memname) as nam_lnth, memlabel label='Variable',
length(memlabel) as lab_lnth from dictionary.tables
where libname="PRODN" and (length(memname)>8 or length(memlabel)>40);
quit;
ods rtf close;
ods listing;

ods listing close;
ods rtf style=style.rtf file="label_length_check.rtf";
*Check variable name, label and their lengths;
proc sql noprint;
create table l_length as
select memname label='Dataset Name', name label='Variable', length(name) as var_lnth, label,
length(label)as lab_lnth from dictionary.columns
where libname="PRODN" and (length(name)>8 or length(label)>40);
quit;

ods rtf close;
ods listing;

*Check length of character variable values that were defined GT 200;
ods listing close;
ods rtf style=style.rtf file="variables_gt_ 200_length.rtf";
proc sql noprint;
create table longvar as
select memname, name, length
from dictionary.columns
where libname="PRODN" and length > 200;
quit;

ods rtf close;
ods listing;




TS-DOC TS-440 - How can I use PROC COMPARE to produce a report ... -

Finding the number of observations in SAS dataset

There are a number of ways of finding out the number of observations in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter a pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname="WORK" and memname="TEST";
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx("nobs",nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option and you don’t need a SAS/AF licence to do it either.

%LET DSID=%SYSFUNC(OPEN(WORK.TEST,IN));
%LET NOBS=%SYSFUNC(ATTRN(&DSID,NOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

The first line opens the data set and the last one closes it; this is needed because you are not using data step or SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

source: technologytales.com

Another Simple way of doing this is to take help from PROC SQL automatic macro variable SQLOBS. Proc SQL automatically creates SALOBS macro variable, when it runs a program. SQLOBS macro variable will have the number of observations count of last proc SQL statement.

Here is how...

The following code will find the number of observations in the dataset AE in the Library name called SAS.

Note: Don't use noprint option in the Proc SQL statement.

libname sas 'C:\Users\Sarath Annapareddy\Desktop\*******;

proc sql;
select * from sas.ae;
quit;
%put SQLOBS=&sqlobs;



9 %put SQLOBS=&sqlobs;
SQLOBS=224



You can also get the number of observations value using Proc Contents. Here is how…

proc contents data=work.dataset out=nobs(where=(varnum=1)keep=memname nobs varnum)noprint;
run;

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.