Wednesday, April 15, 2009

Learn SAS Programming

How to determine the executing program name and path programatically:

Oftentimes, I was asked to keep the name and path of the executing program in the FOOTNOTE of the generated table or listings.I have always created a macro variable using the %let statement and, then I called the Macro variable in the footnote statement to get the name of the program. Eventhough it is simple.. it may not be suitable when we write application which need to self document...

Read more at: http://studysas.blogspot.com/2009/04/how-to-determine-executing-program-name.html
________________________________________________________________________________
How to check if a variable exist or not:
In SAS sometimes, we need to check whether the variable is exist in the dataset or not, we usually run the proc contents program and physically check if the variable exist in the dataset or not.If we want to check it programmatically, then use the following code....

Read more at: http://studysas.blogspot.com/2009/04/how-to-check-if-variable-exists-in-sas.html
_________________________________________________________________________________
How to check if the File is exist or not in SAS:
Guys… Let me try explaining how to check if the file exist in the library or directory using SAS.Here I am writing a macro to check if the file exist in the directory or not.Here is the way to

check it… http://studysas.blogspot.com/2009/04/how-to-check-if-file-is-exist-or-not-in.html
_________________________________________________________________________________
SAS programming errors we make..... can be deadly sometimes
The errors I will list here will be very few in number. They are errors that you will likely make at some time if you do not remain alert. These errors could have serious consequences so that is why I have described them as "deadly errors".

Read more at: http://studysas.blogspot.com/2009/03/sas-programming-errors-we-made-can-be.html
_________________________________________________________________________________
Proc Sort NODUP vs NODUPKEY

Somany times people from my orkut community asked me what is the real difference between the Proc sort Dodup and the Proc Sort nodupkey. I always wanted to answer the question in a better way…. and folkes… here is the answer ..... A common interview question for SAS jobs is "What is the difference between proc sort nodup and proc sort nodupkey?". The answer the interviewer is expecting is usually "proc sort nodup gets rid of duplicate records with the same sort key but proc sort nodupkey gets rid of other records with the same sort key". However, this is not correct.
___________________________________________________________________________________

PROC TRANSPOSE: How to Convert Variables(columns) into Observations(ROWS) and Observations(ROWS) into Variables(Columns)

During my early days as a SAS programmer, I get confused very easily with PROC TRANSPOSE. I mean, I get confused with what are the variables that I need to include in BY statement and ID statement as well as in VAR statement.

Read more at…:http://studysas.blogspot.com/2009/03/proc-transpose-how-to-convert.html

__________________________________________________________________________________
Displaying the Graphs (Bar charts) using PROC GCHART in SAS
Displaying the Graphs (Bar charts) using PROC GCHART in SAS :Just a day ago, I have received a question on Graphs in my orkut community, which prompted me to create following examples.Below are 5 different types of Graphs were produced (including 3d graphs) using Proc Gchart.

Read more at: http://studysas.blogspot.com/2009/03/displaying-graphs-bar-charts-using-proc.html
__________________________________________________________________________________
Change all missing values of all variables into zeros/putting zeros in place of missing values for variables

I always wondered how do I convert missing values for all the variables into zeros and In this example the I have used array to do the same. The variable list includes ID and Score1 to score6.Using simple array method we can change all the missing value for the variables score1 to score6 to 0.

Read more at:http://studysas.blogspot.com/2009/03/change-all-missing-values-of-all.html
__________________________________________________________________________________
How to Save LOG file in the required location:

Here is the simple code which allows us to save the log file in the required location. Use Proc printto procedure to save or print the log file. filename dsn ‘C:\Documents and Settings\zzzzzzzzzzzz\Desktop\LOGfile.lst'

Read more at:http://studysas.blogspot.com/2009/03/how-to-save-log-file-or-what-is-proc.html

____________________________________________________________________________________
Calculating group totals and the counts within each group
Sample 25217: Calculating group totals and the counts within each group This example uses the SUM() function to sum the AMOUNT column, creating a new column named GRPTOTAL with a COMMA10. format. The COUNT() function counts the number of occurrences of STATE within each group. The GROUP BY clause collapses multiple rows for each group into one row per group, containing STATE, GRPTOTAL and the COUNT.

Read more at: http://studysas.blogspot.com/2009/03/calculating-group-totals-and-counts.html
____________________________________________________________________________________
How to customize page numbers in RTF output

In SAS 9.1, are there easier ways to customize page numbers in RTF output?
Yes, beginning with SAS 9.1, page numbers can be customized in the RTF destination by using an escape character and the {thispage} function, {lastpage} function, {pageof} function, or all three:

Read more at: http://studysas.blogspot.com/2009/03/how-to-customize-page-numbers-in-rtf.html
____________________________________________________________________________________
How to calculate number of years and number of days between 2 dates;
How to calculate number of years and number of days between 2 dates;Exploring the yrdif and datdif functions in SAS as well as INTCK function:There are several ways to calculate the number of years between two dates and out of all the methods, YRDIF function results the most accurate value.

Read more at: http://studysas.blogspot.com/2009/03/how-to-calculate-number-of-years-and.html
____________________________________________________________________________________

How to create a comma separated file (.csv) of a SAS dataset?
IN SAS programming, we often require outputting the dataset in different formats like EXCEL and CSV etc and here are the five different ways to export the SAS dataset into .csv file.

Read more at: http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html
____________________________________________________________________________________

How to Import Excel files into SAS

Microsoft Excel spreadsheets can be read from SAS in several ways. Two of these will be demonstrated here. First, PROC IMPORT allows direct access to Excel files through SAS/Access to PC File Formats or access to Comma-Separated (CSV) files through Base SAS. The second method uses the Excel LIBNAME engine.

Read more at: http://studysas.blogspot.com/2009/02/how-to-import-excel-files-into-sas.html
__________________________________________________________________________________
How to store a number more than 8 digits for a numeric variables
Q&A: numeric variables length more than 8? We all know that the default length of the numeric variables in SAS is 8 and if suppose I want to store a number lets say (12345678910, which has a length 11 to numeric variable) to variable total, what should I do?

Read more at: http://studysas.blogspot.com/2009/02/how-to-store-number-more-than-8-digits.html
___________________________________________________________________________________
Options VALIDVARNAME=UPCASE
VALIDVARNAME= V7 UPCASE ANYVALIDVARNAME= option is generally used in SAS whenever we want to control the SAS variable names in the dataset.

Read more at: http://studysas.blogspot.com/2009/02/options-validvarnameupcase.html
___________________________________________________________________________________
How to merge data sets with a common variable?

Here is the simple way of merging the data sets with a common variable if the datasets has the same prefix name.For example: col1-col10, dsn1-dsn 7 , or data1 to data10 with common variable of ID.Considering we have 10 datsets and all of them having the same prefix data;

Read more at: http://studysas.blogspot.com/2009/02/merging-data-sets-with-common-variable.html
_____________________________________________________________________________________
Merging the data sets with a common variable if the datasets has the same prefix name?

For example: col1-col10 dsn1-dsn 7 data1 to data6 with common variable of Usubjid.here is the example, I have 7 datasets i need to merge and each of them having the common variable(usubjid) to merge, and all the datasets having the same prefix dsn(dsn1 to dsn7).

Read more at: http://studysas.blogspot.com/2009/02/merging-data-sets-using-macro-code.html
_____________________________________________________________________________________
when to use &,&&,and &&&,how do we distinguish(Multiple Ampersands) and Diff. Between single dot and double dots in macros
Here are the 2 important questions always comes up in our minds,(& vs && vs &&& and single dot and double dots) when we are dealing with macros for the first time and here are the answers for them.I did find a very good regarding the above topics in the one of the SAS forums and IAN WHITLOCK explained it very clear.

Read more at: http://studysas.blogspot.com/2009/02/3when-to-use-do-we-distinguishmultiple.html
_____________________________________________________________________________________
How can I count number of observations per subject in a data set?

We always have this question in mind, while we do the SAS programming and here is the simple answer for that, we just need to use SUM statement and the FIRST.variable in the SET statement and then the RETAIN statement to calculate the observations count per subject.

Read more at: http://studysas.blogspot.com/2009/02/how-can-i-count-number-of-observations.html
_____________________________________________________________________________________
How to remove the duplicate observations in the dataset using PROC SQL, DATASTEP/PROC SQL/or PROC SORT etc?

Before using a particular step to remove the duplicate observations, we should understand that the duplicate observations are pertaining to the key variables like usubjid, treatment, patientno. etc, which are unique or exact duplicates( 2 or more observations has the duplicates with respect to all the variables in the dataset).If the observations are exact duplicates with respect to all the variables in the dataset, we can remove the exact duplicates by:

Read more at: http://studysas.blogspot.com/2009/02/how-to-remove-duplicate-observations.html
_____________________________________________________________________________________
How to scan more than 20 records to determine variable attributes in EFI:

In Versions 7 and 8 of the SAS System, by default the Import Wizard, PROC IMPORT and the External File Interface (EFI) scan 20 records to determine variable attributes when reading delimited text files.Changing the default setting can only be done for EFI in Version 7, Release 8 and Release 8.1. Beginning in Release 8.2 changing the default setting is applicable to the Import Wizard, PROC IMPORT and EFI.

Read more at: http://studysas.blogspot.com/2009/02/how-to-scan-more-than-20-records-to.html
_____________________________________________________________________________________
SAS Sample Programs:
____________________________________________________________________
Base SAS Certification Summary---Functions:
SAS Functions can be used to convert data and to manipulate the values of character variables. Functions are written by specifying the function name, then it's arguments in parentheses. Arguments can include variables, constants, or expressions. Although arguments are typically separated by commas, they can also be specified as variable lists or arrays.
____________________________________________________________________________________
SAS Certification Assignments
_____________________________________________________________________________
HOW TO CREATE A SAS TRANSPORT(XPT) FILE
______________________________________________________________________
How to Debug the SAS code:

Question:I have a huge SAS program that isn't working. The results I get are not right but there are no errors or warnings in the SAS log. How can I figure out where I went wrong?
_____________________________________________________________________________

Tuesday, April 7, 2009

How to determine the executing program name and path programatically

Sometimes, we need to keep the name and path of the executing program in the FOOTNOTE of the generated table or listings.

I have always created a macro variable using the %let statement and, then I called the Macro variable in the footnote statement to get the name of the program. Eventhough it is simple.. it may not be suitable when we write application which need to self document...

Here is another technique which can be used to retrieve the pathname and filename (last executed) .....

To get the last opened filename:

proc sql noprint;
select scan(xpath,-1,'\') into :progname from sashelp.vextfl
where upcase(xpath) like '%.SAS';
quit;

%put &progname;

result: filename.sas

or

data _null_;
set sashelp.vextfl(where=(upcase(xpath) like '%.SAS'));
call symput('program', scan(xpath,-1,'\'));
run;


%put &program;

result: filename.sas
To get path name ... then .. use the following code;


proc sql noprint;
select xpath into :progname
from sashelp.vextfl where upcase(xpath) like '%.SAS';
quit;

%put &progname;

Result:
H:\Company\Client\Study\ABC304\Programs\filename.sas

or

data _null_;
set sashelp.vextfl
(where=(upcase(xpath) like '%.SAS'));
call symput('program', xpath));
run;


%put &program;

Result:
H:\Company\Client\Study\ABC304\Programs\filename.sas

Here is another coolway to retrieve the filename and path of last executed program;

If you are using Windows operating system, we can access the executing program and its path using two environmental variables SAS_EXECFILENAME and SAS_EXECFILEPATH.

To access the values of these two variables use %SYSGET macro function.
The below macro calls(pathfind and filefind) gets u the name of the executing program and its path:

%macro pathfind;
%sysget(SAS_EXECFILEPATH)
%mend pathfind;
%put %pathfind;

%macro filefind;
%sysget(SAS_EXECFILENAME)
%mend filefind;
%put %filefind;

Monday, April 6, 2009

How to Check, if the variable exists in the SAS dataset or not

How to check if a variable exist or not:

In SAS sometimes, we need to check whether the variable is exist in the dataset or not, we usually run the proc contents program and physically check if the variable exist in the dataset or not.

If we want to check it programmatically, then use the following code....

Sample dataset:

data old;
input ID SCORE1 SCORE2 SCORE3 SCORE4 SCORE5 SCORE6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
60 100 97 69 100 100 100
65 100 98 100 93 90 100
70 99 97 100 100 95 100
40 97 99 98 49 100 95
190 100 99 97 100 100 90
196 100 100 99 100 100 100
210 98 85 88 90 80 95 100
;
run;

data _null_;
dset=open('old');
call symput ('chk',varnum(dset,'SCORE4'));
run;


%put &chk;
RESULT:5

Here I have used both OPEN and VARNUM functions in the program to check if SCORE4 variable exists in a 'OLD' dataset.

The OPEN function opens a SAS data set and the VARNUM function, which returns the variable's position in the data set.

If the variable does not exist then VARNUM returns to 0. and if the result is GT 0 then the variable is exist in the dataset.

In this case, the variable SCORE4 location is column 5, so the value for macrovariable CHK will be 5.

Wednesday, April 1, 2009

How to check if the File is exist or not in SAS

How to check if the File is exist or not in SAS:

Guys… Let me try explaining how to check if the file exist in the library or directory using SAS.

Here I am writing a macro to check if the file exist in the directory or not.

Here is the way to check it…

%macro check(dir= ) ;
%if %sysfunc(fileexist(&dir)) %then %do;
%put File Exists ;
%end;


%else %do ;
%put File doesn’t exist.;
%end ;

%mend ;
%check(dir="C:\Documents and Settings\sannapareddy\Desktop\Holidays 2009.docx");

%sysfunc option empowers me to use the same Data step functions outside the Data step. Almost all of the Data step functions are available to use outside the data step if we use %sysfunc.

fileexist option, verifies the existence of an external file. It can also verifies the SAS library.


Here is another simple technique to check.....

data _null_;
x=filexist('C:\Documents and Settings\sannapareddy\Desktop\Holidays 2009.docx');
call symput('myvar1',x);
run;


%put &myvar1;

Call symput is used to create a macro variable( myvar1)with the variable 'X' value assigned to it.

See the log for details,.... if the value you see in log is one ... then the file exists, if the value in the log is 0 the the file doesn't.


Friday, March 27, 2009

Importance of Warnings and Notes messages from SAS log

The errors I will list here will be very few in number. They are errors that you will likely make at some time if you do not remain alert. These errors could have serious consequences so that is why I have described them as "deadly errors".

Missing "by" statement in merge:

Using "set" instead of "merge":

Existing "in" variable in merge:

"Flag" variables in data steps:

"Fixing" data during a merge:

Averaging averages:

Merging in Key Variables:

Read more at...........


Importance of Warnings and Notes messages from SAS log



A Utility Program for Checking SAS Log Files

Proc Sort NODUP vs NODUPKEY

The SORT Procedure (Proc Sort): Options




We can do many things using PROC SORT like

create a new data setsubset the data rename/ drop/ keep variablesformat, or label your variables etc



Options Available with Proc Sort:

  • OUT= OPTION
  • DESCENDING OPTION
  • DROP=, KEEP=, AND RENAME= OPTIONS
  •  FORMAT AND LABEL STATEMENTS
  • WHERE= OPTION OR WHERE STATEMENT
  • FIRSTOBS= AND OBS= OPTIONS
  • NODUPRECS AND NODUPKEY OPTIONS
  • DUPOUT


A common interview question for SAS jobs is "What is the difference between proc sort nodup and proc sort nodupkey?". The answer the interviewer is expecting is usually "proc sort nodup gets rid of duplicate records with the same sort key but proc sort nodupkey gets rid of other records with the same sort key". However, this is not correct.




Common Programming Mistake with Proc Sort NODUPRECS -


Equivalent of NODUPKEY in PROC SQL

Ian Whitlock Explains...

NODUPKEY is like FIRST. processing. Both depend on order which is an alien

concept to SQL. SQL depends on information stored as data, not in variable
names and not in order.

If for example you had the variables GROUP and SEQ where SEQ is a sequence
number within group, then you could use a GROUP BY GROUP clause with HAVING
SEQ=1. In short there must be something in the data values that indicates
which records you want.

The suggestion to use DISTINCT works when all records having the same value
of GROUP have all other relevant variables with equal values. This would be
equivalent to the NODUP option in PROC SORT.







Wednesday, March 18, 2009

PROC TRANSPOSE: How to Convert Variables(columns) into Observations(ROWS) and Observations(ROWS) into Variables(Columns)

During my early days as a SAS programmer, I used to get confused which statement in PROC TRANSPOSE used for what and why?

PROC TRANSPOSE syntax looks like a straightforward thing, but we need to look at important details it offers and without that we may get unexpected results.

Proc Transpose Options:
Proc Tranpose offers several options like OUT=, LABEL=, NAME=, and PREFIX=.
Each option is distinct from the others. "OUT=" option assigns an output dataset, "LABEL=" option assigns a nemae to the variable which has the label of the transposed variable. If we don’t use "LABEL=" option in the PROC TRANSPOSE syntax, the defalut “_LABEL_” will be assigned to the variable that is being transposed.

"NAME= " option works same as the "LABEL=" option, as if we use NAME=option in the TRANSPOSE syntax, which will assign the name to the variable that is being tranposed.

There is another option that we can use in the TRANSPOSE syntax that is "PREFIX=", which assigns the prefix for the transposed variables.

We have two different type of TRANSPOSE: “UP” and “DOWN”:

“UP” transpose change rows of a dataset to columns, whereas the “Down” transpose change columns to rows. We either use "UP" or "DOWN" tranpose depending upon the requirement.

Whenver we use PROC TRANSPOSE we need to ask some questions ourselves, since visualizing the PROC Transpose syntax is not that easy:

To develop the PROC TRANSPOSE syntax we need to know the answers to the above questions,

1) Which variable/variables need to get transposed?
A) Whatever the variable we mention in the ID statement.
2) What are the variables that need to stay same as in the input dataset?
A) Variables that are included in the BY statement.
3) Which variables values need to become the values for the transposed variable.
A) Variables that are included in the VAR statement.


Here I am taking a simple example:

I have a dataset called X {has 3 variables: QTR (char) mob (num) ncl (num)}

Data x;
input qtr$ mob ncl;

cards;
2006q1 0 4
2006q1 1 5
2006q1 2 4
2006q1 3 6
2006q2 0 7
2006q2 1 2
2006q2 2 8
2006q2 3 7
2006q3 0 2
2006q3 1 4
2006q3 2 8
2006q3 3 8
;

 run;


To get the required output use the following syntax: (Proc sort+Proc Tranpsoe+Arrays)

*Sorting needs to be done before we use PROC TRANSPOSE;
proc sort data=x;
by qtr mob;

run;
 

*Transposing the variable MOB using PROC TRANSPOSE; 
proc transpose data=x out=new prefix=mob;
var ncl;
by qtr mob;

id mob;
run;


*Array was used to convert all missing values to zero;
 
data new; 
set new;
array zero{4} mob0-mob3;
do i=1 to 4;
if zero(i)=. then zero(i)=0;
end;
drop i mob _name_;

run;


Output:





Example 2:

data grades;

input patid name $ class $ grade;
cards;
10 Alice E1 78
10 Alice E2 82
10 Alice E3 86
10 Alice E4 69
10 Alice P1 97
10 Alice F1 160
11 Barabara E1 88
11 Barabara E2 72
11 Barabara E3 86
11 Barabara E4 99

11 Barabara P1 100
11 Barabara F1 170
12 Jane E1 98
12 Jane E2 92
12 Jane E3 92
12 Jane E4 99
12 Jane P1 99
12 Jane F1 185
;
run;


*Using datastep;


data grades2;
set grades;
by patid;
if class='E1' then E1=grade;
else if class='E2' then E2=grade;
else if class='E3' then E3=grade;
else if class='E4' then E4=grade;
else if class='P1' then P1=grade;
else if class='F1' then F1=grade;
if last.patid then output;
retain E: P: F:;
drop class grade;
run;




*Using Arrays;

data grades1;
array t(*) e1 e2 e3 e4 p1 f1;
do i=1 to 6;
set grades;
t(i)=grade;
end;
drop i class grade;
run;




*Using Proc transpose;


proc transpose data=grades out=grades3(drop=_:);
by patid name;
var grade;
id class;
run;




*REVERSE PROCESS; Getting back the Original dataset;


*Using Arrays;

data grade;
set grades1;
array t(*) e1 e2 e3 e4 p1 f1;
do i=1 to 6;
grade=t(i);
output;
end;
drop e1 e2 e3 e4 p1 f1 i;
run;






*Using Proc transpose;


proc transpose data=grades1 out=grade(rename=(_name_=class));
by patid name;
var e1 e2 e3 e4 p1 f1;
run;






*Transposing multiple variables using ARRAYS;


*Example:; Arrays are very effective when you want to transpose multiple variables at a time.;


data test;
length petest $40 peorres $200;
set test(drop=petest);
array tra {*} peorres01-peorres12;
array tr {*} petest01-petest12;
array t {*} peclnsig01-peclnsig12;
do i=1 to 12;
peorres=tra(i);
petest=tr(i);
peclnsig=t(i);
output;
end;
drop peclnsig01-peclnsig12 peorres01-peorres12 petest01-petest12 i;
run;

****You need to use multiple proc transpose steps if you want to do the above process using Proc Transpose .