Friday, March 27, 2009

Importance of Warnings and Notes messages from SAS log

The errors I will list here will be very few in number. They are errors that you will likely make at some time if you do not remain alert. These errors could have serious consequences so that is why I have described them as "deadly errors".

Missing "by" statement in merge:

Using "set" instead of "merge":

Existing "in" variable in merge:

"Flag" variables in data steps:

"Fixing" data during a merge:

Averaging averages:

Merging in Key Variables:

Read more at...........


Importance of Warnings and Notes messages from SAS log



A Utility Program for Checking SAS Log Files

Proc Sort NODUP vs NODUPKEY

The SORT Procedure (Proc Sort): Options




We can do many things using PROC SORT like

create a new data setsubset the data rename/ drop/ keep variablesformat, or label your variables etc



Options Available with Proc Sort:

  • OUT= OPTION
  • DESCENDING OPTION
  • DROP=, KEEP=, AND RENAME= OPTIONS
  •  FORMAT AND LABEL STATEMENTS
  • WHERE= OPTION OR WHERE STATEMENT
  • FIRSTOBS= AND OBS= OPTIONS
  • NODUPRECS AND NODUPKEY OPTIONS
  • DUPOUT


A common interview question for SAS jobs is "What is the difference between proc sort nodup and proc sort nodupkey?". The answer the interviewer is expecting is usually "proc sort nodup gets rid of duplicate records with the same sort key but proc sort nodupkey gets rid of other records with the same sort key". However, this is not correct.




Common Programming Mistake with Proc Sort NODUPRECS -


Equivalent of NODUPKEY in PROC SQL

Ian Whitlock Explains...

NODUPKEY is like FIRST. processing. Both depend on order which is an alien

concept to SQL. SQL depends on information stored as data, not in variable
names and not in order.

If for example you had the variables GROUP and SEQ where SEQ is a sequence
number within group, then you could use a GROUP BY GROUP clause with HAVING
SEQ=1. In short there must be something in the data values that indicates
which records you want.

The suggestion to use DISTINCT works when all records having the same value
of GROUP have all other relevant variables with equal values. This would be
equivalent to the NODUP option in PROC SORT.







Wednesday, March 18, 2009

PROC TRANSPOSE: How to Convert Variables(columns) into Observations(ROWS) and Observations(ROWS) into Variables(Columns)

During my early days as a SAS programmer, I used to get confused which statement in PROC TRANSPOSE used for what and why?

PROC TRANSPOSE syntax looks like a straightforward thing, but we need to look at important details it offers and without that we may get unexpected results.

Proc Transpose Options:
Proc Tranpose offers several options like OUT=, LABEL=, NAME=, and PREFIX=.
Each option is distinct from the others. "OUT=" option assigns an output dataset, "LABEL=" option assigns a nemae to the variable which has the label of the transposed variable. If we don’t use "LABEL=" option in the PROC TRANSPOSE syntax, the defalut “_LABEL_” will be assigned to the variable that is being transposed.

"NAME= " option works same as the "LABEL=" option, as if we use NAME=option in the TRANSPOSE syntax, which will assign the name to the variable that is being tranposed.

There is another option that we can use in the TRANSPOSE syntax that is "PREFIX=", which assigns the prefix for the transposed variables.

We have two different type of TRANSPOSE: “UP” and “DOWN”:

“UP” transpose change rows of a dataset to columns, whereas the “Down” transpose change columns to rows. We either use "UP" or "DOWN" tranpose depending upon the requirement.

Whenver we use PROC TRANSPOSE we need to ask some questions ourselves, since visualizing the PROC Transpose syntax is not that easy:

To develop the PROC TRANSPOSE syntax we need to know the answers to the above questions,

1) Which variable/variables need to get transposed?
A) Whatever the variable we mention in the ID statement.
2) What are the variables that need to stay same as in the input dataset?
A) Variables that are included in the BY statement.
3) Which variables values need to become the values for the transposed variable.
A) Variables that are included in the VAR statement.


Here I am taking a simple example:

I have a dataset called X {has 3 variables: QTR (char) mob (num) ncl (num)}

Data x;
input qtr$ mob ncl;

cards;
2006q1 0 4
2006q1 1 5
2006q1 2 4
2006q1 3 6
2006q2 0 7
2006q2 1 2
2006q2 2 8
2006q2 3 7
2006q3 0 2
2006q3 1 4
2006q3 2 8
2006q3 3 8
;

 run;


To get the required output use the following syntax: (Proc sort+Proc Tranpsoe+Arrays)

*Sorting needs to be done before we use PROC TRANSPOSE;
proc sort data=x;
by qtr mob;

run;
 

*Transposing the variable MOB using PROC TRANSPOSE; 
proc transpose data=x out=new prefix=mob;
var ncl;
by qtr mob;

id mob;
run;


*Array was used to convert all missing values to zero;
 
data new; 
set new;
array zero{4} mob0-mob3;
do i=1 to 4;
if zero(i)=. then zero(i)=0;
end;
drop i mob _name_;

run;


Output:





Example 2:

data grades;

input patid name $ class $ grade;
cards;
10 Alice E1 78
10 Alice E2 82
10 Alice E3 86
10 Alice E4 69
10 Alice P1 97
10 Alice F1 160
11 Barabara E1 88
11 Barabara E2 72
11 Barabara E3 86
11 Barabara E4 99

11 Barabara P1 100
11 Barabara F1 170
12 Jane E1 98
12 Jane E2 92
12 Jane E3 92
12 Jane E4 99
12 Jane P1 99
12 Jane F1 185
;
run;


*Using datastep;


data grades2;
set grades;
by patid;
if class='E1' then E1=grade;
else if class='E2' then E2=grade;
else if class='E3' then E3=grade;
else if class='E4' then E4=grade;
else if class='P1' then P1=grade;
else if class='F1' then F1=grade;
if last.patid then output;
retain E: P: F:;
drop class grade;
run;




*Using Arrays;

data grades1;
array t(*) e1 e2 e3 e4 p1 f1;
do i=1 to 6;
set grades;
t(i)=grade;
end;
drop i class grade;
run;




*Using Proc transpose;


proc transpose data=grades out=grades3(drop=_:);
by patid name;
var grade;
id class;
run;




*REVERSE PROCESS; Getting back the Original dataset;


*Using Arrays;

data grade;
set grades1;
array t(*) e1 e2 e3 e4 p1 f1;
do i=1 to 6;
grade=t(i);
output;
end;
drop e1 e2 e3 e4 p1 f1 i;
run;






*Using Proc transpose;


proc transpose data=grades1 out=grade(rename=(_name_=class));
by patid name;
var e1 e2 e3 e4 p1 f1;
run;






*Transposing multiple variables using ARRAYS;


*Example:; Arrays are very effective when you want to transpose multiple variables at a time.;


data test;
length petest $40 peorres $200;
set test(drop=petest);
array tra {*} peorres01-peorres12;
array tr {*} petest01-petest12;
array t {*} peclnsig01-peclnsig12;
do i=1 to 12;
peorres=tra(i);
petest=tr(i);
peclnsig=t(i);
output;
end;
drop peclnsig01-peclnsig12 peorres01-peorres12 petest01-petest12 i;
run;

****You need to use multiple proc transpose steps if you want to do the above process using Proc Transpose .

Sunday, March 8, 2009

Displaying the Graphs (Bar charts) using PROC GCHART in SAS

Displaying the Graphs (Bar charts) using PROC GCHART in SAS :

Just a day ago, I've received a question on Graphs from one of the member of  my orkut community, which prompted me to create following example.

Below are 5 different types of Graphs were produced (including 3d graphs) using Proc Gchart.
Sample data set used for the examples:

data x;input quarter $ ncl ram;
cards;
q1 20000 2000
q1 30000 3000
q1 45000 2000
q2 23000 2003
q2 45000 4500
q3 46000 5600
q3 89000 600
q3 67000 4500
q4 45890 890
q4 46000 9800

;
run;

*Example1: Simple barchart;
/* Add a title */

title 'Sales Report';

/* Produce the bar chart */
proc gchart data=x;vbar quarter / sumvar=ncl nozero ;
format ncl dollar8.;
run;
quit;
Snapshot of the output:





*Example2: Producing 3D graphs with cylinder shape using Proc Gchart;

pattern c=blue;
axis1 minor=none label=('NCL Sum');


title 'Bar Chart Shape';

/* use the SHAPE= option to specify a cylinder */

proc gchart data=x;
vbar3d quarter / sumvar=ncl discrete raxis=axis1 shape=cylinder
cframe=yellow autoref width=5;
format ncl dollar8.;
run;
quit;


Snapshot of the output:

axis1 minor=none label=('NCL Sum') offset=(0,0);

*Example3: Producing 3D graphs with reference lines using Proc Gchart;
/* use the FRONTFREF option to place reference lines in front of the bars */

proc gchart data=x;
vbar3d quarter / sumvar=ncl discrete raxis=axis1
coutline=black cframe=yellow width=
5vref=30 inside=sum frontref;
format ncl dollar8.;
run;
quit;


Snapshot of the output:

*Example4: Producing graphs after annotating the value above each midpoint bar using Proc Gchart;
*Create annotate dataset to assign the value on each midpoint bar;


data annotate;
length function color style text $ 8;
retain function 'label' color 'black' when 'a' style 'swiss' xsys ysys '2' position '2' size 3 hsys '3' ;
set x;
by quarter;

/*calculating the cumulative totals for each quarter;*/
if first.quarter then final=ncl;
else if ^first.quarter then final+ncl;
midpoint=quarter;
if last.quarter then text=left(put(final,dollar8.));
y=ncl;
run;

/* Define colour patterns for bars */pattern1 c=blue;
pattern2 c=green;
pattern3 c=red;
pattern4 c=yellow;

/* Adding the title */
Title 'Sales per Quarter Report';
/* Produce the bar chart */proc gchart data=x;
vbar quarter /discrete sumvar=ncl
/* subgroup – Width – width of the bar...... Space – space between two bars
Annotate – write the total value on the top of each bar.*/
subgroup=quarter
width=7space=2.0annotate=anno
;
format ncl dollar8.;
run;
quit;


Snap shot of the output:



5)Here is another way to create a graph..
I've added the titles and footnotes.... to this graph..


goptions reset=global gunit=pct border cback=white


colors=(black red green blue) ftitle=swissb
ftext=swiss htitle=6 htext=4
offshadow=(1.5,1.5);



title1 'NCL vs Quarter';
footnote1 h=3 j=r 'Sarath';


axis1 label=none
origin=(24,);
axis2 label=none
minor=(number=1)
offset=(,0);
legend1 label=none
shape=bar(3,3)
cborder=black
cblock=gray
origin=(24,);



pattern1 color=lipk;
pattern2 color=cyan;
pattern3 color=lime;
pattern4 color=red;
pattern5 color=blue;


proc gchart data=x;
format ncl dollar8.;
vbar3d quarter / sumvar=ncl
subgroup=ncl
outside=sum
width=9
space=7
maxis=axis1
raxis=axis2
coutline=black
legend=legend1;
run;
quit;





Snapshot of the output:









SAS GRaph


Thursday, March 5, 2009

Change all missing values of all variables into zeros/putting zeros in place of missing values for variables

Have you been asked how to convert missing values for all the variables into zeros..... if you are.... here is the answer for that.....

In this example the I have used array to do the same.

The variable list includes ID and Score1 to score6.Using simple array method we can change all the missing value for the variables score1 to score6 to 0.

data old;
input ID SCORE1 SCORE2 SCORE3 SCORE4 SCORE5 SCORE6;
cards;
24 100 97 . 100 85 85
28 . 87 98 100 . 90
60 100 . . 100 100 100
65 100 98 100 . 90 100
70 99 97 100 100 95 100
40 97 99 98 . 100 95
190 100 . 97 100 100 90
196 100 100 . 100 100 100
210 . 85 . 90 80 95
;
 

run;

*Ist Method;
data new;

set old;
array zero score1-score6;do over zero;
if zero=. then zero=0;
end;
run;


*2nd Method;
data new;

set old;
array nums _numeric_;

do over nums;
if nums=. then nums=0;
end;
run;



proc print;
Title 'Missing values changed to zero using arrays and a do loop';

run;
Output:



ID SCORE1 SCORE2 SCORE3 SCORE4 SCORE5 SCORE6

24 100 97 0 100 85 85
28 0 87 98 100 0 90
60 100 0 0 100 100 100
65 100 98 100 0 90 100
70 99 97 100 100 95 100
40 97 99 98 0 100 95
190 100 0 97 100 100 90
196 100 100 0 100 100 100
210 0 85 0 90 80 95




 Missing values changed to zero using arrays and a do loop.


What if we don't want to convert all.. missing values in variables to zero... I mean .. some of them needs to be converted to zeros and some to 1.

Here is the sample code for that:

The following code will convert all the missing values into either 1 or 0 depending upon the value of ID. If the value of ID less than or equal to 70 then the missing value should be converted to 1 else if the ID value is greater than 70 then the missing values can be converted into 0.

data new1;
set old;
array RS(6) score1-score6 ; 
do i=1 to 6;
if ID le 70 then do;
if RS(i)=. then RS(i)=1; 

end;
else if id gt 70 then do;
if RS(i)=. then RS(i)=0; 

end;
end;
run;

 

/*Macro converts all missing values for numeric variables into 0*/
%macro replaceMissing(ds);
DATA &ds.;SET &ds.;
ARRAY ZERO _NUMERIC_;
DO OVER ZERO;
if ZERO=. then ZERO=0;
end;

run;
%mend replaceMissing;
%replacemissing(dsn);


********************************************************************;
data missing;
set sashelp.column;
array chars _character_;
do over chars;
if chars='' then chars='Missing';
end;
array nums _numeric_;
do over nums;
if nums=. then nums=0;
end;
run;
********************************************************************;
The above code converts missing values of all charcater variables in the sashelp.column dataset  to 'MISSING' . It also converts missing values of all numeric variables in the sashelp.column dataset to 0. 




Wednesday, March 4, 2009

Sending the SAS Log to an Output File

Here is the simple code which allows us to save the log file in the required location.

Use Proc printto procedure to save or print the log file.

filename dsn ‘C:\Documents and Settings\zzzzzzzzzzzz\Desktop\LOGfile.lst';

data dsn;
input patid implants;
datalines;
1 3
1 1
1 2
1 1
2 1
2 2
3 1
4 2
3 1
4 5
2 3
1 6
;
run;

proc printto log=dsn new;
run;


Example 2:
proc printto log="C:\temp\LOGLOG.log" print="C:\temp\LSTLST.lst" new;
run;


*Select only male students and age less than 16;
proc sql;
create table males as
select age, height, weight
from sashelp.class
where sex='M' and age lt 16
order by age;
quit;

*Get the descriptive statistics for height variable by age;

proc means data=males ;
by age;
var height;
output out=htstats mean=mean n=n std=sd median=med min=min max=max;
run;

PROC PRINTTO; *The second PROC PRINTTO step redirects output back to its default destination.;
run;


After PROC PRINTTO executes, all procedure output is sent to the alternate location until you execute another PROC PRINTTO statement or until your program or session ends.

SAS Date,Time and datetime functions

A Beginners Guide to SAS Date and Time Handling:

Abstract:

The SAS system provides a plethora of methods to handle date and time values. Correct date
computation became a very important subject when the world became aware of the Year 2000 issue. Computer users now realize most applications contain date and time variables. This beginning tutorial describes how the SAS system 1) stores dates, datetimes, and times; 2) reads date/time variables from "Raw Data Files" and from SAS data sets. It covers when and where to use SAS Informats and formats. Next it describes the easy methods to perform date/time arithmetic via date/time SAS functions.

The paper shows how SAS date Formats can be used in SAS Procedures such as PRINT, FREQ,
and CHART. Finally the tutorial will discuss Year 2000 issues and how SAS software helps you maintain correct date integrity and prevent future Y2k problems in the new millennium.

More can be read at: http://www.kellogg.northwestern.edu/researchcomputing/workshops/papers/finley_sugi25.pdf

How to Read, Write, and Manipulate SAS® Dates:

ABSTRACT:

No matter how long you’ve been programming in SAS, using and manipulating dates still seems to require effort.

Learn all about SAS dates, the different ways they can be presented, and how to make them useful. This paper includes excellent examples in dealing with raw input dates, functions to manage dates, and outputting SAS dates into other formats. Included is all the date information you will need: date and time functions, Informats, formats, and arithmetic operations.

WHAT IS A SAS DATE?
A date is unique within SAS programming. It is neither a character value nor a typical numeric. It is a special case of a numeric variable. In other words, a SAS date is a special representation of a calendar date. Unlike dates in many other languages, SAS has a specific numeric value assigned to each day. The starting point for all SAS dates is January 1st, 1960 and is represented as day zero (0). All previous and subsequent dates are represented with numeric values plus (+) or minus (-) from this starting point. The simplicity of the approach is there will never be a point in the past (since the start of the Gregorian calendar) or future that can not be represented by a
number.

More can be read at: http://analytics.ncsu.edu/sesug/2008/HOW-063.pdf

Working With SAS® System Date and Time Functions

SAS Date, Time, and Datetime Functions

Tuesday, March 3, 2009

Calculating group totals and the counts within each group

Sample 25217: Calculating group totals and the counts within each group

This example uses the SUM() function to sum the AMOUNT column, creating a new column named GRPTOTAL with a COMMA10. format. The COUNT() function counts the number of occurrences of STATE within each group.

The GROUP BY clause collapses multiple rows for each group into one row per group, containing STATE, GRPTOTAL and the COUNT.


data one; input state $ amount;
cards;
CA 7000
CA 6500
CA 5800
NC 4800
NC 3640
SC 3520
VA 4490
VA 8700
VA 2850
VA 1111 
 
proc sql;
create table two as select state ,sum(amount) as grptotal format=comma10. , count(*) as count from one group by state;
quit


proc print data=two noobs;
run;

 

How to customize page numbers in RTF output

Usage Note 24439: In SAS 9.1, are there easier ways to customize page numbers in RTF output?

direct link here http://support.sas.com/kb/24/439.html

Yes, beginning with SAS 9.1, page numbers can be customized in the RTF destination by using an escape character and the {thispage} function, {lastpage} function, {pageof} function, or all three:

ods escapechar='^';
ods listing close;
ods rtf file='c:\tests\test.rtf';

data test;
do i=1 to 50;
output;
end;
run;



proc print data=test noobs;
title 'Page ^{thispage} of ^{lastpage}';
footnote '^{pageof}';
run;



ods listing;
ods rtf close;





Monday, March 2, 2009

How to calculate number of years and number of days between 2 dates;

How to calculate number of years and number of days between 2 dates;

Exploring the yrdif and datdif functions in SAS as well as INTCK function:
There are several ways to calculate the number of years between two dates and out of all the methods, YRDIF function results the most accurate value.

Syntax:

ageinyrs = YRDIF(birthdate, enddate, 'act/act');
ag_indays = DATDIF(birthdate, enddate, 'act/act');


act/act” will gives us the actual interval between the two dates.

The YRDIF function returns the calculated years between two SAS date values. The returned value will be a precise calculation, including multiple decimal places. Whereas with INTCK function will just give the rounded value like 10, 11 and not like 10.2 and 11.5.

Syntax:
Using YRDIF function:
To know the interval between two dates in Years:


data _null_;
sdate="12mar1998"d;
edate="12jun2008"d;
years=yrdif(sdate,edate,'act/act');

put years;
run;


Output: 10.2535 yrs

Using DTDIF function:
To know the interval between two dates in Days:


data _null_;
sdate="12mar1998"d;
edate="12jun2008"d;
days=datdif(sdate,edate,'act/act');

put days;
run;


output: 3745 days


Using the INTCK function:

data _null_;
sdate="12mar1998"d;
edate="12jun2008"d;
years=intck('year',sdate,edate);
put years;

run;

output:10 years

The calculation for the number of years from INTCK function is different from that generated by YRDIF. This is because the INTCK function bases the interval from the start of the respective intervals.

The INTCK function returns the integer count of the number of intervals in years, months or days between two dates.

Ex:
To know the interval between 2 dates in days:


data _null_;
sdate="12mar1998"d;
edate="12jun2008"d;
days=intck(‘days’,sdate,edate);

put days;
run;

result: 3745 days


To know the interval between 2 dates in months:

data _null_;
sdate="12mar1998"d;edate="12jun2008"d;
months=intck('months',sdate,edate);

put months;
run;

result: 123 months

Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...