Discover More Tips and Techniques on This Blog

Showing posts with label Proc compare. Show all posts
Showing posts with label Proc compare. Show all posts

How to Address PROC COMPARE Reporting Same Values as Different in SAS

How to Address PROC COMPARE Reporting Same Values as Different in SAS

How to Address PROC COMPARE Reporting Same Values as Different in SAS

Working with large datasets in SAS often requires comparing data between two tables. The PROC COMPARE procedure is an essential tool for this task, but sometimes it reports values as different even when they appear to be identical. This issue can arise from various causes, such as numeric precision differences, rounding issues, or formatting inconsistencies. In this post, we will explore common causes of this issue and how to resolve them.

1. Numeric Precision Issues

SAS stores numeric values using floating-point precision, which can lead to small differences that aren't immediately visible. These differences may cause PROC COMPARE to report discrepancies even though the values seem the same.

Solution: Use the CRITERION or FUZZ option to define an acceptable tolerance for differences.


proc compare base=dataset1 compare=dataset2 criterion=0.00001;
run;
    

2. Rounding Differences

If values have been rounded differently in two datasets, PROC COMPARE may detect them as different. For example, one dataset may round to two decimal places, while the other doesn't.

Solution: Apply consistent rounding to both datasets before comparison.


data dataset1_rounded;
    set dataset1;
    value = round(value, 0.01); /* Round to two decimal places */
run;

data dataset2_rounded;
    set dataset2;
    value = round(value, 0.01); /* Same rounding precision */
run;

proc compare base=dataset1_rounded compare=dataset2_rounded;
run;
    

3. Formatting Differences

Sometimes, two values are the same but have different formats applied, leading to a perceived difference by PROC COMPARE.

Solution: Use the NOFORMAT option to ignore formatting in the comparison.


proc compare base=dataset1 compare=dataset2 noformat;
run;
    

4. Character Value Differences (Case Sensitivity and Whitespace)

SAS is case-sensitive when comparing character variables. Extra whitespace at the end of strings can also cause PROC COMPARE to flag a difference.

Solution: Standardize case and remove any unnecessary spaces using the COMPRESS or UPCASE functions.


data dataset1_clean;
    set dataset1;
    char_var = compress(upcase(char_var));
run;

data dataset2_clean;
    set dataset2;
    char_var = compress(upcase(char_var));
run;

proc compare base=dataset1_clean compare=dataset2_clean;
run;
    

5. Handling Different Variable Lengths

Character variables with different lengths may also trigger discrepancies in the comparison.

Solution: Ensure that corresponding variables have the same length in both datasets using LENGTH statements.

Conclusion

By addressing issues related to numeric precision, rounding, formatting, and character data, you can reduce or eliminate discrepancies reported by PROC COMPARE in SAS. These solutions ensure more accurate and meaningful comparisons between datasets.

Feel free to leave a comment if you have additional tips or if you’ve encountered other challenges with PROC COMPARE in SAS!

Proc Compare/Dictionary.Coulmns/Dictionary.Tables.: Program used to compare the SAS datasets in two directories

Here is the new Proc compare.sas program, I have developed ....to compare all the datasets in 2 directories(testing and production) at once and to quick check any mismatches.

Proc compare only check if there is any mismatches between the datasets in 2 directories. If any, it reports otherwise it will give us a note saying that:

Note: No unequal Values were found. All values compared are exactly equal.

See the proc compare snap shot:


What if any dataset has the length more than 8, and what if any variable length more than 40 and what if the dataset name has more than 8 characters etc... Proc Compare doesn't address this issue.

I have developed the following program to address this issue. It’s a mandatory that we need to follow certain requirements when we are preparing for an electronic submission to the FDA.

The following are some of the QC checks FDA requirements:
1) The length of a dataset name & variable name shouldn’t be more than 8 characters.
2) The length data set label and a variable label shouldn’t be more than 40 characters.

This following program will give the SAS programmer a basic idea of how to check the dataset and variable attributes using the metadata (dictionary.columns and dictionary.tables) using Proc SQL. This program will save us some critical time.


Here are the details this program will give us…
1) Compare the variable attributes and prints the differences (length, format and informats) between production and testing directories .
2) Compare the labels, no. of observations and no. of variables in the datasets and prints if there is any differences between testing and production directories.
3) Checks the data set label and its length and prints if any dataset name GT 8 and dataset label GT 40.
4) Checks the variable label and their lengths of a dataset and prints if any dataset name GT 8 and dataset label GT 40.
5) Checks length of (character)variables and prints them if any variable has GT 200 in length;

****************************************************************;
*** Program: proccompare.sas ***;
*** Version: 1.0 ***;
*** Client: ABC Pharmaceuticals, Inc. ***;
*** Protocol: ABC-2009 ***;
*** Programmer: Sarath Annapareddy ***;
*** Date: Mar 31st 2009 ***;
*** Purpose: Program used to compare the attributes *** lengths,labels,formats and ***;
*** informats) of datasets in production and testing libraries***;
*** Program also used to check the length of variables in each***; *** dataset. ***;
*****************************************************************;

libname test 'H:\company\client\Testing\#####\###########\### datasets';
libname prodn 'H:\company\client\Testing\#####\###########\ ### datasets';

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _test as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("TEST",libname)
order by memname, name;
create table _test1 as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("TEST",libname));
quit;

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _prodn as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("PRODN",libname)
order by memname, name;
create table _prodnl as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("PRODN",libname));
quit;

*Run proc compare to check variable attributes in prodn and test directories;
ods listing close;
ods rtf style=style.rtf file="Compare_vars_Out.rtf";
proc compare data=_prodn compare=_test;
id memname name label;
run;
ods rtf close;
ods listing;

*Run proc compare to check labels, no. of obs and no. of variables of the datasets;
ods listing close;
ods rtf style=style.rtf file="Compare_dataset_Out.rtf";
proc compare data=_prodnl(drop=libname) compare=_test1(drop=libname);
run;

ods rtf close;
ods listing;

*Check analysis data set name, label and their lengths;
ods listing close;
ods rtf style=style.rtf file="variable_length_check.rtf";
proc sql noprint;
create table v_length as
select memname label='Dataset Name', length(memname) as nam_lnth, memlabel label='Variable',
length(memlabel) as lab_lnth from dictionary.tables
where libname="PRODN" and (length(memname)>8 or length(memlabel)>40);
quit;
ods rtf close;
ods listing;

ods listing close;
ods rtf style=style.rtf file="label_length_check.rtf";
*Check variable name, label and their lengths;
proc sql noprint;
create table l_length as
select memname label='Dataset Name', name label='Variable', length(name) as var_lnth, label,
length(label)as lab_lnth from dictionary.columns
where libname="PRODN" and (length(name)>8 or length(label)>40);
quit;

ods rtf close;
ods listing;

*Check length of character variable values that were defined GT 200;
ods listing close;
ods rtf style=style.rtf file="variables_gt_ 200_length.rtf";
proc sql noprint;
create table longvar as
select memname, name, length
from dictionary.columns
where libname="PRODN" and length > 200;
quit;

ods rtf close;
ods listing;




TS-DOC TS-440 - How can I use PROC COMPARE to produce a report ... -

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.