Friday, February 12, 2010

Replace Missing Numeric Values using Missing Option/Proc STDIZE

MISSING OPTION
Replacing missing values with the desired value like a zero is always a challenge, especially when we have a dataset with a number of columns to standardize. The OLD WAY of doing it to write a DATA step code with  if……then statements like...
 
if var=. then var=0;
to make 0 appear instead of . (dot) in the tables output.

With the SAS option called missing you can save a lot of typing.  If you place the following SAS option code before the generation of the table output:

Option missing="0";

SAS will display 0 (zero) instead of the . (dot) on the table. In fact it can display whatever the character we would like to display for missing values.  You can use a - (line) or * (star).

Always change it back the option missing setting to default as  Option Missing='';  otherwise you may endup getting unexpected results;

Warning: Since we are using this option to display zeros instead of .(dots), you shouldn't use this option before any of  SAS procedures like PROC UNIVARIATE, PROC MEANS, PROC SUMMARY and PROC FREQ because they don’t treat missing as zero. So if you use Option Missing='0' we are telling SAS to consider missing as ZERO which is not what the above procedure generally do.

**********************************************************************************;
Proc STDIZE: Proc STDIZE is very useful in processing the missing values.  It offers a simple and very effective solution when we want to process missing values.

MISSING=specifies the method or a numeric value for replacing missing values

REPLACEreplaces missing data by zero in the standardized data

REPONLYreplaces missing data by the location measure (does not standardize the data)


Example:

data miss;
input a b c;

cards;
1 . 3
. . 2
3 2 .
1 3 4
2 . .
. 2 .
run;


proc stdize data=old reponly MISSING=0 out=new;
var _numeric_;
run;

Output: 
 



Thursday, February 11, 2010

COMPRESS: SAS Function strips characters from the string

In SAS 9.1.3 , an extra argument (MODIFIER) is added to the SAS character string function COMPRESS and these modifiers modifies the action of the COMPRESS function;
Syntax: COMPRESS(<, chars><, modifiers>) ;

Following characters can be used as modifiers.

a – Compress or Delete all upper and lower case characters from String.

ak- Compress or Delete alphabetic characters(1,2,3 etc) from String.

kd- Compress or Delete characters(aplabets) from String.( Keeps only digits).

d – Compress or Delete numerical values from String.

i – Compress or Delete specified characters both upper and lower case from String.

k – keeps the specified characters in the string instead of removing them.

l – Compress or Delete lowercase characters from String.

p – Compress or Delete Punctuation characters from String.

s – Compress or delete spaces from String. This is default.

u – Compress or Delete uppercase characters from String.
See the Example below:

data _null_ ;

string='StudySAS Blog! 17752. ' ;
string1=compress(string,'') ; *Compress spaces. This is default;
string2=compress(string,'','ak');*Compress alphabetic chars(1,2etc);
string3=compress(string,'','d') ; *Compress numerical values;
string4=compress(string,'','l');*Compress  lowercase characters;
string5=compress(string,'','u');*Compress uppercase characters;
string6=compress(string,'S','k');*Keeps only specifiedcharacters;
string7=compress(string,'!.','P');*Compress Punctuations only;
string8=compress(string,'s','i');*upper/lower case specified characters;
string9=compress(string,'','a');*Compress all upper\lower case  characters ;
string10=compress(string,'','s') ; * Compress or delete spaces;
string11=compress(string,'','kd') ; *Compress alphabets (Keeps only digits);
put string1= ;
put string2= ;
put string3= ;
put string4= ;
put string5= ;
put string6= ;
put string7= ;
put string8= ;
put string9= ;
put string10=;
put string11=;
run ;

OUTPUT:

string1=StudySASBlog!17752.

string2=StudySAS Blog
string3=StudySASBlog!.
string4=SSASB!17752.
string5=tudylog!17752.
string6=SSS
string7=StudySAS Blog 17752
string8=tudyA Blog! 17752.
string9=!17752.
string10=StudySASBlog!17752.
string11=17752

Tuesday, February 9, 2010

MMDDYY+: Format that inserts a slash, space or "-" between the date, month and year

Here is another new format which is available with SAS 9 version called MMDDYY+ . (MMDDYY+ format can insert colons /slash or periods etc between date, month and year values); This new date format developed by SAS works for many other formats, for ex:

MMDDYYB.; *format inserts Blanks in between date, month and years;
MMDDYYC.; *format inserts Colons in between date, month and years;
MMDDYYD.; *format inserts Dashes in between date, month and years;
MMDDYYN.;*format inserts Nothing in between date,month & years;
MMDDYYP.; *format inserts Periods in between date, month & years;
MMDDYYS.; *format inserts Slash in between date, month and years;

When you have date, month and year as separate variables and you want to insert colons or slash or periods and even blanks between them…. MMDDYY+ is your answer.

Here I will show how?

data have;
input date month year;
cards;

25 1 2005
19 3 2006

12 2 2004
1 12 2007
;

run;


data want;
set have;
format bdate MMDDYYB.; *format inserts Blanks.'B' means Blank;
format cdate MMDDYYC.; *format inserts Colons. 'C' means Colon;
format ddate MMDDYYD.; *format inserts Dashes. 'D' means Dash;
format ndate MMDDYYN.; *format inserts Nothing.'N' means Nothing;
format pdate MMDDYYP.; *format inserts Periods. 'P' means Period;
format sdate MMDDYYS.; *format inserts Slash. 'S' means SLASH ;
bdate = mdy(month,"01",year);
cdate = mdy(month,"01",year);
ddate = mdy(month,"01",year);
ndate = mdy(month,"01",year);

pdate = mdy(month,"01",year);
sdate = mdy(month,"01",year);
run;

Output:


No more left(trim) or CAT/CATX/CATT/CATS functions..........

Sunday, February 7, 2010

ERROR: The format XXXX was not found or could not be loaded.

Whenever you try to open a SAS data set which has permanent formats, you will get the error message in the log file saying

“ERROR: The format XXXX was not found or could not be loaded.”

This happens generally when you don't have the format library where the SAS dataset located. What happens here is, SAS will not permit you to open the data set without the format library.

But, if you use options nofmterr; at the top of your program, SAS will opens the dataset for you despite the fact that it does not have the format library.

SAS doesn't treat missing format or informat as an error when the options nofmterr; statement used.


The FMTERR\NOFMTERR option determines whether or not SAS generates an error message when a variable's format cannot be found. (source: support.sas.com)
The caveat here is, you will not be able to see the formatted values for your variables, but you will be able to open or use the dataset.

OPTIONS nofmterr;
*you will not be able to see the formatted values for your variables;

If you want to see the formatted values for the variables, you need to use AUTOEXEC.sas program.
Autoexec.sas program uses fmtsearch option which will tell SAS to look for formats in the assigned Formats library. This works great if I want to first open SAS, and then open a dataset.

AUTOEXEC.sas Program:
** Location of formats catalog **;
specify location of formats folder inside the libname statement.

libname frmtdir "C:\Documents and Settings\sreddy\Desktop\studyfolder";
options fmtsearch=(WORK frmtdir library);

When you submit above code, SAS will search for formats first in the work library, then in the format directory (fmtdir) and then in the default SAS library.


Permanent formats can be accessed with LIBREF LIBRARY directly, while FMTSEARCH option needs to be specified with user-defined LIBREF.

Saturday, February 6, 2010

How to check how the dataset got sorted without looking at the code?

You can do this manually or programmatically.

Manual Approach: Right click on the data set in the explorer and go to properties, then go to the details tab and see the sorted by values (Works only if you use Windows operating system)
Programmatic approach: Use proc contents and create an output dataset of the contents. The sorted and sortedby columns will give us the details in about how the dataset got sorted in place.

Note: proc contents will give missing values if the dataset isn’t sorted.

*Example;


proc sort data=sashelp.class out=class;
by name sex age;
run;


proc contents data=class out=contents noprint;run;


proc print data=contents;
var memname name sorted sortedby;
run;



('DiggThis’)

Friday, February 5, 2010

How to add leading zeros to numeric variables

Have you ever asked to create a variable with leading zeros? I mean 1 to 001 and 2 to 002. If you were, do you know how....

SAS has a special numeric format Zw.d. which can include leading zeros.

Zw.d Format in which,
w : Width of variable.

d : Decimal Point.
Zw.d format writes standard numeric data with leading 0's. The Zw.d format is similar to the w.d format except that Zw.d pads right-aligned output with 0s instead of blanks. (Ref: SAS 9.2 Language Reference : Dictionary, 4th edition);


Let me give you a scenario where this can be very useful:

You have a variable called site with the numbers as 101, 999, 1001 and 1200. If you want to create a USUBJID variable with fixed length for all the subjects, we need to modify the site variable variables to fixed length. (length=4). You can do that using Z4.d format.

Note: As this is a numeric format, this can be applied to numeric variables only.

*z4.format is used to add the leading zeros to site variable;


data have;
     length site $4;
site='101';  subject='1001';output;
site='129';  subject='1002';output;
site='999';  subject='1003';output;
site='1000'; subject='1004';output;
site='1010'; subject='1005';output;
site='1011'; subject='1006';output;
run;

data want;
length usubjid $19;
set have(rename=(site=sitec));
studyid='ABC10049';
site=put(sitec+0,z4.); *Z4. format is applied after converting the character variable sitec to site; 
usubjid=catx('-',studyid,site,subject);
        keep usubjid site;
run;



Output:


USUBJID                         SITE
ABC10049-0101-1001 0101
ABC10049-0129-1002 0129
ABC10049-0999-1003 0999
ABC10049-1000-1004 1000
ABC10049-1010-1005 1010
ABC10049-1011-1006 1011


You could also use REPEAT function also:

Here is how...

site=catt(repeat('0',3-length(sitec)),sitec);

Please note that the 3 reference to the minimum length of site variable. I have used 3 because the minimum length in the given example is 3.