Proc compare only check if there is any mismatches between the datasets in 2 directories. If any, it reports otherwise it will give us a note saying that:
Note: No unequal Values were found. All values compared are exactly equal.
See the proc compare snap shot:
What if any dataset has the length more than 8, and what if any variable length more than 40 and what if the dataset name has more than 8 characters etc... Proc Compare doesn't address this issue.
I have developed the following program to address this issue. It’s a mandatory that we need to follow certain requirements when we are preparing for an electronic submission to the FDA.
The following are some of the QC checks FDA requirements:
1) The length of a dataset name & variable name shouldn’t be more than 8 characters.
2) The length data set label and a variable label shouldn’t be more than 40 characters.
This following program will give the SAS programmer a basic idea of how to check the dataset and variable attributes using the metadata (dictionary.columns and dictionary.tables) using Proc SQL. This program will save us some critical time.
Here are the details this program will give us…
1) Compare the variable attributes and prints the differences (length, format and informats) between production and testing directories .
2) Compare the labels, no. of observations and no. of variables in the datasets and prints if there is any differences between testing and production directories.
3) Checks the data set label and its length and prints if any dataset name GT 8 and dataset label GT 40.
4) Checks the variable label and their lengths of a dataset and prints if any dataset name GT 8 and dataset label GT 40.
5) Checks length of (character)variables and prints them if any variable has GT 200 in length;
****************************************************************;
*** Program: proccompare.sas ***;
*** Version: 1.0 ***;
*** Client: ABC Pharmaceuticals, Inc. ***;
*** Protocol: ABC-2009 ***;
*** Programmer: Sarath Annapareddy ***;
*** Date: Mar 31st 2009 ***;
*** Purpose: Program used to compare the attributes *** lengths,labels,formats and ***;
*** informats) of datasets in production and testing libraries***;
*** Program also used to check the length of variables in each***; *** dataset. ***;
*****************************************************************;
libname test 'H:\company\client\Testing\#####\###########\### datasets';
libname prodn 'H:\company\client\Testing\#####\###########\ ### datasets';
*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _test as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("TEST",libname)
order by memname, name;
create table _test1 as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("TEST",libname));
quit;
*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _prodn as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("PRODN",libname)
order by memname, name;
create table _prodnl as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("PRODN",libname));
quit;
*Run proc compare to check variable attributes in prodn and test directories;
ods listing close;
ods rtf style=style.rtf file="Compare_vars_Out.rtf";
proc compare data=_prodn compare=_test;
id memname name label;
run;
ods rtf close;
ods listing;
*Run proc compare to check labels, no. of obs and no. of variables of the datasets;
ods listing close;
ods rtf style=style.rtf file="Compare_dataset_Out.rtf";
proc compare data=_prodnl(drop=libname) compare=_test1(drop=libname);
run;
ods rtf close;
ods listing;
*Check analysis data set name, label and their lengths;
ods listing close;
ods rtf style=style.rtf file="variable_length_check.rtf";
proc sql noprint;
create table v_length as
select memname label='Dataset Name', length(memname) as nam_lnth, memlabel label='Variable',
length(memlabel) as lab_lnth from dictionary.tables
where libname="PRODN" and (length(memname)>8 or length(memlabel)>40);
quit;
ods rtf close;
ods listing;
ods listing close;
ods rtf style=style.rtf file="label_length_check.rtf";
*Check variable name, label and their lengths;
proc sql noprint;
create table l_length as
select memname label='Dataset Name', name label='Variable', length(name) as var_lnth, label,
length(label)as lab_lnth from dictionary.columns
where libname="PRODN" and (length(name)>8 or length(label)>40);
quit;
ods rtf close;
ods listing;
*Check length of character variable values that were defined GT 200;
ods listing close;
ods rtf style=style.rtf file="variables_gt_ 200_length.rtf";
proc sql noprint;
create table longvar as
select memname, name, length
from dictionary.columns
where libname="PRODN" and length > 200;
quit;
ods rtf close;
ods listing;
TS-DOC TS-440 - How can I use PROC COMPARE to produce a report ... -