Wednesday, April 22, 2009

maxvarlen_macro: Check the Length of all character variables length is LT 200

MAXVARLEN Macro:
According to FDA released the Guidance for Industry: Providing Regulatory
Submission in Electronic Format – NDAs which was released in Jan 1999, one of the important point to comply with the agency’s regulations is : The length of any character values cannot be more than 200 characters.

Here is the macro which will give us the list of character variables whose actual length (not the assigned) is greater than 200. You can use this program to check the maximum length of all variables of any dataset at once. This macro will create a table with list of variables and their length.
***********************************************************************;
*** Program: macro_maxvarlen.sas ***;
*** Version: 1.0 ***;
*** Client: ***;
*** Protocol: ***;
*** Programmer: sarath Annapareddy ***;
*** Date: 22APR2009 ***;
*** Purpose: Macro used to check the list of variables whose ***;
*** length is GT 200. ***;
*** ***;
******************************************************************* ***;
libname prodn 'H:\Company\Client\Testing\XXXX\XXXXX\Test map datasets';

option nofmterr;

%macro maxvarlen(base,dsn);data dsn;
set sashelp.vcolumn(where=(libname=upcase("&base") and memname=upcase("&dsn") and type='char'));
call symput(cats("var",_n_),name);
id=_n_;
keep name id;
run;

*Counting number of character variables in the dataset;
proc sql noprint;
select count(*)into :nvars from dsn;
quit;

*Computing max. length of each character variable in the dataset;
%do i = 1 %to &nvars;
proc sql noprint;
create table maxlen&i as
select max(length(&&var&i)) as mlen from &base..&dsn
quit;
%end;

*Concatenating all the datasets;
data final;
id=_n_;
set %do i= 1 %to &nvars;
maxlen&i
%end;
;
run;

%mend;
%maxvarlen
(prodn,ae);

*Final dataset with list of variables whose length is GT 200;proc sql;
create table mxvarlen as
select a.name ,b.mlen
from dsn as a, final as b
where a.id=b.id and mlen gt 200;
quit;

Proc Compare/Dictionary.Coulmns/Dictionary.Tables.: Program used to compare the SAS datasets in two directories

Here is the new Proc compare.sas program, I have developed ....to compare all the datasets in 2 directories(testing and production) at once and to quick check any mismatches.

Proc compare only check if there is any mismatches between the datasets in 2 directories. If any, it reports otherwise it will give us a note saying that:

Note: No unequal Values were found. All values compared are exactly equal.

See the proc compare snap shot:


What if any dataset has the length more than 8, and what if any variable length more than 40 and what if the dataset name has more than 8 characters etc... Proc Compare doesn't address this issue.

I have developed the following program to address this issue. It’s a mandatory that we need to follow certain requirements when we are preparing for an electronic submission to the FDA.

The following are some of the QC checks FDA requirements:
1) The length of a dataset name & variable name shouldn’t be more than 8 characters.
2) The length data set label and a variable label shouldn’t be more than 40 characters.

This following program will give the SAS programmer a basic idea of how to check the dataset and variable attributes using the metadata (dictionary.columns and dictionary.tables) using Proc SQL. This program will save us some critical time.


Here are the details this program will give us…
1) Compare the variable attributes and prints the differences (length, format and informats) between production and testing directories .
2) Compare the labels, no. of observations and no. of variables in the datasets and prints if there is any differences between testing and production directories.
3) Checks the data set label and its length and prints if any dataset name GT 8 and dataset label GT 40.
4) Checks the variable label and their lengths of a dataset and prints if any dataset name GT 8 and dataset label GT 40.
5) Checks length of (character)variables and prints them if any variable has GT 200 in length;

****************************************************************;
*** Program: proccompare.sas ***;
*** Version: 1.0 ***;
*** Client: ABC Pharmaceuticals, Inc. ***;
*** Protocol: ABC-2009 ***;
*** Programmer: Sarath Annapareddy ***;
*** Date: Mar 31st 2009 ***;
*** Purpose: Program used to compare the attributes *** lengths,labels,formats and ***;
*** informats) of datasets in production and testing libraries***;
*** Program also used to check the length of variables in each***; *** dataset. ***;
*****************************************************************;

libname test 'H:\company\client\Testing\#####\###########\### datasets';
libname prodn 'H:\company\client\Testing\#####\###########\ ### datasets';

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _test as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("TEST",libname)
order by memname, name;
create table _test1 as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("TEST",libname));
quit;

*creating the proc contents like output with Proc Sql;
proc sql noprint;
create table _prodn as
select memname label='Dataset Name',
name label='Variable',
type label='Type',
length as length,
label,format label='Format',
informat label='Informat'
from dictionary.columns
where indexw("PRODN",libname)
order by memname, name;
create table _prodnl as
select distinct libname,memname,memlabel,nobs,nvar
from dictionary.tables
where (indexw("PRODN",libname));
quit;

*Run proc compare to check variable attributes in prodn and test directories;
ods listing close;
ods rtf style=style.rtf file="Compare_vars_Out.rtf";
proc compare data=_prodn compare=_test;
id memname name label;
run;
ods rtf close;
ods listing;

*Run proc compare to check labels, no. of obs and no. of variables of the datasets;
ods listing close;
ods rtf style=style.rtf file="Compare_dataset_Out.rtf";
proc compare data=_prodnl(drop=libname) compare=_test1(drop=libname);
run;

ods rtf close;
ods listing;

*Check analysis data set name, label and their lengths;
ods listing close;
ods rtf style=style.rtf file="variable_length_check.rtf";
proc sql noprint;
create table v_length as
select memname label='Dataset Name', length(memname) as nam_lnth, memlabel label='Variable',
length(memlabel) as lab_lnth from dictionary.tables
where libname="PRODN" and (length(memname)>8 or length(memlabel)>40);
quit;
ods rtf close;
ods listing;

ods listing close;
ods rtf style=style.rtf file="label_length_check.rtf";
*Check variable name, label and their lengths;
proc sql noprint;
create table l_length as
select memname label='Dataset Name', name label='Variable', length(name) as var_lnth, label,
length(label)as lab_lnth from dictionary.columns
where libname="PRODN" and (length(name)>8 or length(label)>40);
quit;

ods rtf close;
ods listing;

*Check length of character variable values that were defined GT 200;
ods listing close;
ods rtf style=style.rtf file="variables_gt_ 200_length.rtf";
proc sql noprint;
create table longvar as
select memname, name, length
from dictionary.columns
where libname="PRODN" and length > 200;
quit;

ods rtf close;
ods listing;




TS-DOC TS-440 - How can I use PROC COMPARE to produce a report ... -

Wednesday, April 15, 2009

Learn SAS Programming

How to determine the executing program name and path programatically:

Oftentimes, I was asked to keep the name and path of the executing program in the FOOTNOTE of the generated table or listings.I have always created a macro variable using the %let statement and, then I called the Macro variable in the footnote statement to get the name of the program. Eventhough it is simple.. it may not be suitable when we write application which need to self document...

Read more at: http://studysas.blogspot.com/2009/04/how-to-determine-executing-program-name.html
________________________________________________________________________________
How to check if a variable exist or not:
In SAS sometimes, we need to check whether the variable is exist in the dataset or not, we usually run the proc contents program and physically check if the variable exist in the dataset or not.If we want to check it programmatically, then use the following code....

Read more at: http://studysas.blogspot.com/2009/04/how-to-check-if-variable-exists-in-sas.html
_________________________________________________________________________________
How to check if the File is exist or not in SAS:
Guys… Let me try explaining how to check if the file exist in the library or directory using SAS.Here I am writing a macro to check if the file exist in the directory or not.Here is the way to

check it… http://studysas.blogspot.com/2009/04/how-to-check-if-file-is-exist-or-not-in.html
_________________________________________________________________________________
SAS programming errors we make..... can be deadly sometimes
The errors I will list here will be very few in number. They are errors that you will likely make at some time if you do not remain alert. These errors could have serious consequences so that is why I have described them as "deadly errors".

Read more at: http://studysas.blogspot.com/2009/03/sas-programming-errors-we-made-can-be.html
_________________________________________________________________________________
Proc Sort NODUP vs NODUPKEY

Somany times people from my orkut community asked me what is the real difference between the Proc sort Dodup and the Proc Sort nodupkey. I always wanted to answer the question in a better way…. and folkes… here is the answer ..... A common interview question for SAS jobs is "What is the difference between proc sort nodup and proc sort nodupkey?". The answer the interviewer is expecting is usually "proc sort nodup gets rid of duplicate records with the same sort key but proc sort nodupkey gets rid of other records with the same sort key". However, this is not correct.
___________________________________________________________________________________

PROC TRANSPOSE: How to Convert Variables(columns) into Observations(ROWS) and Observations(ROWS) into Variables(Columns)

During my early days as a SAS programmer, I get confused very easily with PROC TRANSPOSE. I mean, I get confused with what are the variables that I need to include in BY statement and ID statement as well as in VAR statement.

Read more at…:http://studysas.blogspot.com/2009/03/proc-transpose-how-to-convert.html

__________________________________________________________________________________
Displaying the Graphs (Bar charts) using PROC GCHART in SAS
Displaying the Graphs (Bar charts) using PROC GCHART in SAS :Just a day ago, I have received a question on Graphs in my orkut community, which prompted me to create following examples.Below are 5 different types of Graphs were produced (including 3d graphs) using Proc Gchart.

Read more at: http://studysas.blogspot.com/2009/03/displaying-graphs-bar-charts-using-proc.html
__________________________________________________________________________________
Change all missing values of all variables into zeros/putting zeros in place of missing values for variables

I always wondered how do I convert missing values for all the variables into zeros and In this example the I have used array to do the same. The variable list includes ID and Score1 to score6.Using simple array method we can change all the missing value for the variables score1 to score6 to 0.

Read more at:http://studysas.blogspot.com/2009/03/change-all-missing-values-of-all.html
__________________________________________________________________________________
How to Save LOG file in the required location:

Here is the simple code which allows us to save the log file in the required location. Use Proc printto procedure to save or print the log file. filename dsn ‘C:\Documents and Settings\zzzzzzzzzzzz\Desktop\LOGfile.lst'

Read more at:http://studysas.blogspot.com/2009/03/how-to-save-log-file-or-what-is-proc.html

____________________________________________________________________________________
Calculating group totals and the counts within each group
Sample 25217: Calculating group totals and the counts within each group This example uses the SUM() function to sum the AMOUNT column, creating a new column named GRPTOTAL with a COMMA10. format. The COUNT() function counts the number of occurrences of STATE within each group. The GROUP BY clause collapses multiple rows for each group into one row per group, containing STATE, GRPTOTAL and the COUNT.

Read more at: http://studysas.blogspot.com/2009/03/calculating-group-totals-and-counts.html
____________________________________________________________________________________
How to customize page numbers in RTF output

In SAS 9.1, are there easier ways to customize page numbers in RTF output?
Yes, beginning with SAS 9.1, page numbers can be customized in the RTF destination by using an escape character and the {thispage} function, {lastpage} function, {pageof} function, or all three:

Read more at: http://studysas.blogspot.com/2009/03/how-to-customize-page-numbers-in-rtf.html
____________________________________________________________________________________
How to calculate number of years and number of days between 2 dates;
How to calculate number of years and number of days between 2 dates;Exploring the yrdif and datdif functions in SAS as well as INTCK function:There are several ways to calculate the number of years between two dates and out of all the methods, YRDIF function results the most accurate value.

Read more at: http://studysas.blogspot.com/2009/03/how-to-calculate-number-of-years-and.html
____________________________________________________________________________________

How to create a comma separated file (.csv) of a SAS dataset?
IN SAS programming, we often require outputting the dataset in different formats like EXCEL and CSV etc and here are the five different ways to export the SAS dataset into .csv file.

Read more at: http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html
____________________________________________________________________________________

How to Import Excel files into SAS

Microsoft Excel spreadsheets can be read from SAS in several ways. Two of these will be demonstrated here. First, PROC IMPORT allows direct access to Excel files through SAS/Access to PC File Formats or access to Comma-Separated (CSV) files through Base SAS. The second method uses the Excel LIBNAME engine.

Read more at: http://studysas.blogspot.com/2009/02/how-to-import-excel-files-into-sas.html
__________________________________________________________________________________
How to store a number more than 8 digits for a numeric variables
Q&A: numeric variables length more than 8? We all know that the default length of the numeric variables in SAS is 8 and if suppose I want to store a number lets say (12345678910, which has a length 11 to numeric variable) to variable total, what should I do?

Read more at: http://studysas.blogspot.com/2009/02/how-to-store-number-more-than-8-digits.html
___________________________________________________________________________________
Options VALIDVARNAME=UPCASE
VALIDVARNAME= V7 UPCASE ANYVALIDVARNAME= option is generally used in SAS whenever we want to control the SAS variable names in the dataset.

Read more at: http://studysas.blogspot.com/2009/02/options-validvarnameupcase.html
___________________________________________________________________________________
How to merge data sets with a common variable?

Here is the simple way of merging the data sets with a common variable if the datasets has the same prefix name.For example: col1-col10, dsn1-dsn 7 , or data1 to data10 with common variable of ID.Considering we have 10 datsets and all of them having the same prefix data;

Read more at: http://studysas.blogspot.com/2009/02/merging-data-sets-with-common-variable.html
_____________________________________________________________________________________
Merging the data sets with a common variable if the datasets has the same prefix name?

For example: col1-col10 dsn1-dsn 7 data1 to data6 with common variable of Usubjid.here is the example, I have 7 datasets i need to merge and each of them having the common variable(usubjid) to merge, and all the datasets having the same prefix dsn(dsn1 to dsn7).

Read more at: http://studysas.blogspot.com/2009/02/merging-data-sets-using-macro-code.html
_____________________________________________________________________________________
when to use &,&&,and &&&,how do we distinguish(Multiple Ampersands) and Diff. Between single dot and double dots in macros
Here are the 2 important questions always comes up in our minds,(& vs && vs &&& and single dot and double dots) when we are dealing with macros for the first time and here are the answers for them.I did find a very good regarding the above topics in the one of the SAS forums and IAN WHITLOCK explained it very clear.

Read more at: http://studysas.blogspot.com/2009/02/3when-to-use-do-we-distinguishmultiple.html
_____________________________________________________________________________________
How can I count number of observations per subject in a data set?

We always have this question in mind, while we do the SAS programming and here is the simple answer for that, we just need to use SUM statement and the FIRST.variable in the SET statement and then the RETAIN statement to calculate the observations count per subject.

Read more at: http://studysas.blogspot.com/2009/02/how-can-i-count-number-of-observations.html
_____________________________________________________________________________________
How to remove the duplicate observations in the dataset using PROC SQL, DATASTEP/PROC SQL/or PROC SORT etc?

Before using a particular step to remove the duplicate observations, we should understand that the duplicate observations are pertaining to the key variables like usubjid, treatment, patientno. etc, which are unique or exact duplicates( 2 or more observations has the duplicates with respect to all the variables in the dataset).If the observations are exact duplicates with respect to all the variables in the dataset, we can remove the exact duplicates by:

Read more at: http://studysas.blogspot.com/2009/02/how-to-remove-duplicate-observations.html
_____________________________________________________________________________________
How to scan more than 20 records to determine variable attributes in EFI:

In Versions 7 and 8 of the SAS System, by default the Import Wizard, PROC IMPORT and the External File Interface (EFI) scan 20 records to determine variable attributes when reading delimited text files.Changing the default setting can only be done for EFI in Version 7, Release 8 and Release 8.1. Beginning in Release 8.2 changing the default setting is applicable to the Import Wizard, PROC IMPORT and EFI.

Read more at: http://studysas.blogspot.com/2009/02/how-to-scan-more-than-20-records-to.html
_____________________________________________________________________________________
SAS Sample Programs:
____________________________________________________________________
Base SAS Certification Summary---Functions:
SAS Functions can be used to convert data and to manipulate the values of character variables. Functions are written by specifying the function name, then it's arguments in parentheses. Arguments can include variables, constants, or expressions. Although arguments are typically separated by commas, they can also be specified as variable lists or arrays.
____________________________________________________________________________________
SAS Certification Assignments
_____________________________________________________________________________
HOW TO CREATE A SAS TRANSPORT(XPT) FILE
______________________________________________________________________
How to Debug the SAS code:

Question:I have a huge SAS program that isn't working. The results I get are not right but there are no errors or warnings in the SAS log. How can I figure out where I went wrong?
_____________________________________________________________________________

Tuesday, April 7, 2009

How to determine the executing program name and path programatically

Sometimes, we need to keep the name and path of the executing program in the FOOTNOTE of the generated table or listings.

I have always created a macro variable using the %let statement and, then I called the Macro variable in the footnote statement to get the name of the program. Eventhough it is simple.. it may not be suitable when we write application which need to self document...

Here is another technique which can be used to retrieve the pathname and filename (last executed) .....

To get the last opened filename:

proc sql noprint;
select scan(xpath,-1,'\') into :progname from sashelp.vextfl
where upcase(xpath) like '%.SAS';
quit;

%put &progname;

result: filename.sas

or

data _null_;
set sashelp.vextfl(where=(upcase(xpath) like '%.SAS'));
call symput('program', scan(xpath,-1,'\'));
run;


%put &program;

result: filename.sas
To get path name ... then .. use the following code;


proc sql noprint;
select xpath into :progname
from sashelp.vextfl where upcase(xpath) like '%.SAS';
quit;

%put &progname;

Result:
H:\Company\Client\Study\ABC304\Programs\filename.sas

or

data _null_;
set sashelp.vextfl
(where=(upcase(xpath) like '%.SAS'));
call symput('program', xpath));
run;


%put &program;

Result:
H:\Company\Client\Study\ABC304\Programs\filename.sas

Here is another coolway to retrieve the filename and path of last executed program;

If you are using Windows operating system, we can access the executing program and its path using two environmental variables SAS_EXECFILENAME and SAS_EXECFILEPATH.

To access the values of these two variables use %SYSGET macro function.
The below macro calls(pathfind and filefind) gets u the name of the executing program and its path:

%macro pathfind;
%sysget(SAS_EXECFILEPATH)
%mend pathfind;
%put %pathfind;

%macro filefind;
%sysget(SAS_EXECFILENAME)
%mend filefind;
%put %filefind;

Monday, April 6, 2009

How to Check, if the variable exists in the SAS dataset or not

How to check if a variable exist or not:

In SAS sometimes, we need to check whether the variable is exist in the dataset or not, we usually run the proc contents program and physically check if the variable exist in the dataset or not.

If we want to check it programmatically, then use the following code....

Sample dataset:

data old;
input ID SCORE1 SCORE2 SCORE3 SCORE4 SCORE5 SCORE6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
60 100 97 69 100 100 100
65 100 98 100 93 90 100
70 99 97 100 100 95 100
40 97 99 98 49 100 95
190 100 99 97 100 100 90
196 100 100 99 100 100 100
210 98 85 88 90 80 95 100
;
run;

data _null_;
dset=open('old');
call symput ('chk',varnum(dset,'SCORE4'));
run;


%put &chk;
RESULT:5

Here I have used both OPEN and VARNUM functions in the program to check if SCORE4 variable exists in a 'OLD' dataset.

The OPEN function opens a SAS data set and the VARNUM function, which returns the variable's position in the data set.

If the variable does not exist then VARNUM returns to 0. and if the result is GT 0 then the variable is exist in the dataset.

In this case, the variable SCORE4 location is column 5, so the value for macrovariable CHK will be 5.

Wednesday, April 1, 2009

How to check if the File is exist or not in SAS

How to check if the File is exist or not in SAS:

Guys… Let me try explaining how to check if the file exist in the library or directory using SAS.

Here I am writing a macro to check if the file exist in the directory or not.

Here is the way to check it…

%macro check(dir= ) ;
%if %sysfunc(fileexist(&dir)) %then %do;
%put File Exists ;
%end;


%else %do ;
%put File doesn’t exist.;
%end ;

%mend ;
%check(dir="C:\Documents and Settings\sannapareddy\Desktop\Holidays 2009.docx");

%sysfunc option empowers me to use the same Data step functions outside the Data step. Almost all of the Data step functions are available to use outside the data step if we use %sysfunc.

fileexist option, verifies the existence of an external file. It can also verifies the SAS library.


Here is another simple technique to check.....

data _null_;
x=filexist('C:\Documents and Settings\sannapareddy\Desktop\Holidays 2009.docx');
call symput('myvar1',x);
run;


%put &myvar1;

Call symput is used to create a macro variable( myvar1)with the variable 'X' value assigned to it.

See the log for details,.... if the value you see in log is one ... then the file exists, if the value in the log is 0 the the file doesn't.


Friday, March 27, 2009

Importance of Warnings and Notes messages from SAS log

The errors I will list here will be very few in number. They are errors that you will likely make at some time if you do not remain alert. These errors could have serious consequences so that is why I have described them as "deadly errors".

Missing "by" statement in merge:

Using "set" instead of "merge":

Existing "in" variable in merge:

"Flag" variables in data steps:

"Fixing" data during a merge:

Averaging averages:

Merging in Key Variables:

Read more at...........


Importance of Warnings and Notes messages from SAS log



A Utility Program for Checking SAS Log Files