Sunday, February 14, 2010
Friday, February 12, 2010
Replace Missing Numeric Values using Missing Option/Proc STDIZE
MISSING OPTION
Replacing missing values with the desired value like a zero is always a challenge, especially when we have a dataset with a number of columns to standardize. The OLD WAY of doing it to write a DATA step code with if……then statements like...
if var=. then var=0;
to make 0 appear instead of . (dot) in the tables output.
With the SAS option called missing you can save a lot of typing. If you place the following SAS option code before the generation of the table output:
Option missing="0";
SAS will display 0 (zero) instead of the . (dot) on the table. In fact it can display whatever the character we would like to display for missing values. You can use a - (line) or * (star).
Always change it back the option missing setting to default as Option Missing=''; otherwise you may endup getting unexpected results;
Warning: Since we are using this option to display zeros instead of .(dots), you shouldn't use this option before any of SAS procedures like PROC UNIVARIATE, PROC MEANS, PROC SUMMARY and PROC FREQ because they don’t treat missing as zero. So if you use Option Missing='0' we are telling SAS to consider missing as ZERO which is not what the above procedure generally do.
**********************************************************************************;
Proc STDIZE: Proc STDIZE is very useful in processing the missing values. It offers a simple and very effective solution when we want to process missing values.
Example:
data miss;
input a b c;
cards;
1 . 3
. . 2
3 2 .
1 3 4
2 . .
. 2 .
run;
proc stdize data=old reponly MISSING=0 out=new;
var _numeric_;
run;
Output:
Replacing missing values with the desired value like a zero is always a challenge, especially when we have a dataset with a number of columns to standardize. The OLD WAY of doing it to write a DATA step code with if……then statements like...
if var=. then var=0;
to make 0 appear instead of . (dot) in the tables output.
With the SAS option called missing you can save a lot of typing. If you place the following SAS option code before the generation of the table output:
Option missing="0";
SAS will display 0 (zero) instead of the . (dot) on the table. In fact it can display whatever the character we would like to display for missing values. You can use a - (line) or * (star).
Always change it back the option missing setting to default as Option Missing=''; otherwise you may endup getting unexpected results;
Warning: Since we are using this option to display zeros instead of .(dots), you shouldn't use this option before any of SAS procedures like PROC UNIVARIATE, PROC MEANS, PROC SUMMARY and PROC FREQ because they don’t treat missing as zero. So if you use Option Missing='0' we are telling SAS to consider missing as ZERO which is not what the above procedure generally do.
**********************************************************************************;
Proc STDIZE: Proc STDIZE is very useful in processing the missing values. It offers a simple and very effective solution when we want to process missing values.
MISSING= | specifies the method or a numeric value for replacing missing values |
REPLACE | replaces missing data by zero in the standardized data |
REPONLY | replaces missing data by the location measure (does not standardize the data) |
Example:
data miss;
input a b c;
cards;
1 . 3
. . 2
3 2 .
1 3 4
2 . .
. 2 .
run;
proc stdize data=old reponly MISSING=0 out=new;
var _numeric_;
run;
Output:
Thursday, February 11, 2010
COMPRESS: SAS Function strips characters from the string
In SAS 9.1.3 , an extra argument (MODIFIER) is added to the SAS character string function COMPRESS and these modifiers modifies the action of the COMPRESS function;
Syntax: COMPRESS(
Syntax: COMPRESS(
Tuesday, February 9, 2010
MMDDYY+: Format that inserts a slash, space or "-" between the date, month and year
Here is another new format which is available with SAS 9 version called MMDDYY+ . (MMDDYY+ format can insert colons /slash or periods etc between date, month and year values); This new date format developed by SAS works for many other formats, for ex:
MMDDYYB.; *format inserts Blanks in between date, month and years;
MMDDYYC.; *format inserts Colons in between date, month and years;
MMDDYYD.; *format inserts Dashes in between date, month and years;
MMDDYYN.;*format inserts Nothing in between date,month & years;
MMDDYYP.; *format inserts Periods in between date, month & years;
MMDDYYS.; *format inserts Slash in between date, month and years;
When you have date, month and year as separate variables and you want to insert colons or slash or periods and even blanks between them…. MMDDYY+ is your answer.
Here I will show how?
data have;
input date month year;
cards;
25 1 2005
19 3 2006
12 2 2004
1 12 2007
;
run;
data want;
set have;
format bdate MMDDYYB.; *format inserts Blanks.'B' means Blank;
format cdate MMDDYYC.; *format inserts Colons. 'C' means Colon;
format ddate MMDDYYD.; *format inserts Dashes. 'D' means Dash;
format ndate MMDDYYN.; *format inserts Nothing.'N' means Nothing;
format pdate MMDDYYP.; *format inserts Periods. 'P' means Period;
format sdate MMDDYYS.; *format inserts Slash. 'S' means SLASH ;
bdate = mdy(month,"01",year);
cdate = mdy(month,"01",year);
ddate = mdy(month,"01",year);
ndate = mdy(month,"01",year);
pdate = mdy(month,"01",year);
sdate = mdy(month,"01",year);
run;
Output:
MMDDYYB.; *format inserts Blanks in between date, month and years;
MMDDYYC.; *format inserts Colons in between date, month and years;
MMDDYYD.; *format inserts Dashes in between date, month and years;
MMDDYYN.;*format inserts Nothing in between date,month & years;
MMDDYYP.; *format inserts Periods in between date, month & years;
MMDDYYS.; *format inserts Slash in between date, month and years;
When you have date, month and year as separate variables and you want to insert colons or slash or periods and even blanks between them…. MMDDYY+ is your answer.
Here I will show how?
data have;
input date month year;
cards;
25 1 2005
19 3 2006
12 2 2004
1 12 2007
;
run;
data want;
set have;
format bdate MMDDYYB.; *format inserts Blanks.'B' means Blank;
format cdate MMDDYYC.; *format inserts Colons. 'C' means Colon;
format ddate MMDDYYD.; *format inserts Dashes. 'D' means Dash;
format ndate MMDDYYN.; *format inserts Nothing.'N' means Nothing;
format pdate MMDDYYP.; *format inserts Periods. 'P' means Period;
format sdate MMDDYYS.; *format inserts Slash. 'S' means SLASH ;
bdate = mdy(month,"01",year);
cdate = mdy(month,"01",year);
ddate = mdy(month,"01",year);
ndate = mdy(month,"01",year);
pdate = mdy(month,"01",year);
sdate = mdy(month,"01",year);
run;
Output:
No more left(trim) or CAT/CATX/CATT/CATS functions..........
Sunday, February 7, 2010
ERROR: The format XXXX was not found or could not be loaded.
Whenever you try to open a SAS data set which has permanent formats, you will get the error message in the log file saying
“ERROR: The format XXXX was not found or could not be loaded.”
This happens generally when you don't have the format library where the SAS dataset located. What happens here is, SAS will not permit you to open the data set without the format library.
But, if you use options nofmterr; at the top of your program, SAS will opens the dataset for you despite the fact that it does not have the format library.
SAS doesn't treat missing format or informat as an error when the options nofmterr; statement used.
The FMTERR\NOFMTERR option determines whether or not SAS generates an error message when a variable's format cannot be found. (source: support.sas.com)
The caveat here is, you will not be able to see the formatted values for your variables, but you will be able to open or use the dataset.
OPTIONS nofmterr;
*you will not be able to see the formatted values for your variables;
If you want to see the formatted values for the variables, you need to use AUTOEXEC.sas program.
Autoexec.sas program uses fmtsearch option which will tell SAS to look for formats in the assigned Formats library. This works great if I want to first open SAS, and then open a dataset.
AUTOEXEC.sas Program:
** Location of formats catalog **;
When you submit above code, SAS will search for formats first in the work library, then in the format directory (fmtdir) and then in the default SAS library.
Permanent formats can be accessed with LIBREF LIBRARY directly, while FMTSEARCH option needs to be specified with user-defined LIBREF.
“ERROR: The format XXXX was not found or could not be loaded.”
This happens generally when you don't have the format library where the SAS dataset located. What happens here is, SAS will not permit you to open the data set without the format library.
But, if you use options nofmterr; at the top of your program, SAS will opens the dataset for you despite the fact that it does not have the format library.
SAS doesn't treat missing format or informat as an error when the options nofmterr; statement used.
The FMTERR\NOFMTERR option determines whether or not SAS generates an error message when a variable's format cannot be found. (source: support.sas.com)
The caveat here is, you will not be able to see the formatted values for your variables, but you will be able to open or use the dataset.
OPTIONS nofmterr;
*you will not be able to see the formatted values for your variables;
If you want to see the formatted values for the variables, you need to use AUTOEXEC.sas program.
Autoexec.sas program uses fmtsearch option which will tell SAS to look for formats in the assigned Formats library. This works great if I want to first open SAS, and then open a dataset.
AUTOEXEC.sas Program:
** Location of formats catalog **;
specify location of formats folder inside the libname statement.
libname frmtdir "C:\Documents and Settings\sreddy\Desktop\studyfolder";
options fmtsearch=(WORK frmtdir library);
When you submit above code, SAS will search for formats first in the work library, then in the format directory (fmtdir) and then in the default SAS library.
Permanent formats can be accessed with LIBREF LIBRARY directly, while FMTSEARCH option needs to be specified with user-defined LIBREF.
Saturday, February 6, 2010
How to check how the dataset got sorted without looking at the code?
You can do this manually or programmatically.
*Example;
proc sort data=sashelp.class out=class;
by name sex age;
run;
proc contents data=class out=contents noprint;run;
proc print data=contents;
var memname name sorted sortedby;
run;
('’)
Manual Approach: Right click on the data set in the explorer and go to properties, then go to the details tab and see the sorted by values (Works only if you use Windows operating system)
Programmatic approach: Use proc contents and create an output dataset of the contents. The sorted and sortedby columns will give us the details in about how the dataset got sorted in place.
Note: proc contents will give missing values if the dataset isn’t sorted.
proc sort data=sashelp.class out=class;
by name sex age;
run;
proc contents data=class out=contents noprint;run;
proc print data=contents;
var memname name sorted sortedby;
run;
('’)
Friday, February 5, 2010
How to add leading zeros to numeric variables
Have you ever asked to create a variable with leading zeros? I mean 1 to 001 and 2 to 002. If you were, do you know how....
SAS has a special numeric format Zw.d. which can include leading zeros.
Zw.d Format in which,
w : Width of variable.
Let me give you a scenario where this can be very useful:
You have a variable called site with the numbers as 101, 999, 1001 and 1200. If you want to create a USUBJID variable with fixed length for all the subjects, we need to modify the site variable variables to fixed length. (length=4). You can do that using Z4.d format.
Note: As this is a numeric format, this can be applied to numeric variables only.
*z4.format is used to add the leading zeros to site variable;
data have;
length site $4;
site='101'; subject='1001';output;
site='129'; subject='1002';output;
site='999'; subject='1003';output;
site='1000'; subject='1004';output;
site='1010'; subject='1005';output;
site='1011'; subject='1006';output;
run;
data want;
length usubjid $19;
set have(rename=(site=sitec));
studyid='ABC10049';
site=put(sitec+0,z4.); *Z4. format is applied after converting the character variable sitec to site;
usubjid=catx('-',studyid,site,subject);
keep usubjid site;
run;
Output:
USUBJID SITE
ABC10049-0101-1001 0101
ABC10049-0129-1002 0129
ABC10049-0999-1003 0999
ABC10049-1000-1004 1000
ABC10049-1010-1005 1010
ABC10049-1011-1006 1011
SAS has a special numeric format Zw.d. which can include leading zeros.
Zw.d Format in which,
w : Width of variable.
d : Decimal Point.
Zw.d format writes standard numeric data with leading 0's. The Zw.d format is similar to the w.d format except that Zw.d pads right-aligned output with 0s instead of blanks. (Ref: SAS 9.2 Language Reference : Dictionary, 4th edition);
Let me give you a scenario where this can be very useful:
You have a variable called site with the numbers as 101, 999, 1001 and 1200. If you want to create a USUBJID variable with fixed length for all the subjects, we need to modify the site variable variables to fixed length. (length=4). You can do that using Z4.d format.
Note: As this is a numeric format, this can be applied to numeric variables only.
*z4.format is used to add the leading zeros to site variable;
data have;
length site $4;
site='101'; subject='1001';output;
site='129'; subject='1002';output;
site='999'; subject='1003';output;
site='1000'; subject='1004';output;
site='1010'; subject='1005';output;
site='1011'; subject='1006';output;
run;
data want;
length usubjid $19;
set have(rename=(site=sitec));
studyid='ABC10049';
site=put(sitec+0,z4.); *Z4. format is applied after converting the character variable sitec to site;
usubjid=catx('-',studyid,site,subject);
keep usubjid site;
run;
Output:
USUBJID SITE
ABC10049-0101-1001 0101
ABC10049-0129-1002 0129
ABC10049-0999-1003 0999
ABC10049-1000-1004 1000
ABC10049-1010-1005 1010
ABC10049-1011-1006 1011
You could also use REPEAT function also:
Here is how...
site=catt(repeat('0',3-length(sitec)),sitec);
Please note that the 3 reference to the minimum length of site variable. I have used 3 because the minimum length in the given example is 3.
Thursday, February 4, 2010
How to extract year information from three formats of dates in a dataset? (IMPUTED Dates)
The LINKEDIN SAS Professionals group had a question on How to extract year information from three formats of dates in a dataset? (IMPUTED Dates). There were a number of good suggestions submitted. Here is a summary of the suggestions:
**************************************************************;
*Method1: Using scan or substr with length functions;
data temp;
infile datalines;
input date:$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
data temp;
set temp;
if length(date) = 4 then year = input(substr(date,1,4),best32.);
else if length(date) = 7 then year = input(substr(date,4,4),best32.);
else if length(date) = 10 then year = input(substr(date,7,4),best32.);
run;
*(or);
data temp;
set temp;
if length(date) = 4 then year = input(date,best.);
else if length(date) = 7 then year = input(scan(date,2,'-'),best.);
else if length(date) = 10 then year = input(scan(date,3,'-'),best.);
run;
*************************************************************;
*Method2:Starting with Version 8, the SCAN function has the ability to scan from the end backwards;
data temp;
infile datalines ;
length date $10 ;
input @1 date $char10. ;
year = input(scan(date,-1),best.) ;
datalines ;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method3:another option would be the reverse function*/
/* first reverse the date so that the year is always in the first four characters*/
/* substr off the first four characters and reverse again */ ;
data temp;
infile datalines;
input date:$10.;
year =input(reverse(substr(reverse(compress(date)),1,4)),?? 8.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method4: assumed date not an outputted format value. assumed date var is char ;
data temp;
infile datalines;
input date:$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
data temp;
length year $4;
set temp;
lgth =length(date); /*** possible values 1 (when blank), 4,7,10 ***/
start =lgth-3;
if start <1 then /*** when blank ***/
year =' ';
else year =substr(date,start,4);
drop lgth start;
run;
**************************************************************;
*Method 5: This solution is based on the fact that the last four characters are the year.;
data temp;
infile datalines;
input date:$10.;
year = input(substr(date,length(date)-3,4),4.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
('’)
**************************************************************;
*Method1: Using scan or substr with length functions;
data temp;
infile datalines;
input date:$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
data temp;
set temp;
if length(date) = 4 then year = input(substr(date,1,4),best32.);
else if length(date) = 7 then year = input(substr(date,4,4),best32.);
else if length(date) = 10 then year = input(substr(date,7,4),best32.);
run;
*(or);
data temp;
set temp;
if length(date) = 4 then year = input(date,best.);
else if length(date) = 7 then year = input(scan(date,2,'-'),best.);
else if length(date) = 10 then year = input(scan(date,3,'-'),best.);
run;
*************************************************************;
*Method2:Starting with Version 8, the SCAN function has the ability to scan from the end backwards;
data temp;
infile datalines ;
length date $10 ;
input @1 date $char10. ;
year = input(scan(date,-1),best.) ;
datalines ;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method3:another option would be the reverse function*/
/* first reverse the date so that the year is always in the first four characters*/
/* substr off the first four characters and reverse again */ ;
data temp;
infile datalines;
input date:$10.;
year =input(reverse(substr(reverse(compress(date)),1,4)),?? 8.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method4: assumed date not an outputted format value. assumed date var is char ;
data temp;
infile datalines;
input date:$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
data temp;
length year $4;
set temp;
lgth =length(date); /*** possible values 1 (when blank), 4,7,10 ***/
start =lgth-3;
if start <1 then /*** when blank ***/
year =' ';
else year =substr(date,start,4);
drop lgth start;
run;
**************************************************************;
*Method 5: This solution is based on the fact that the last four characters are the year.;
data temp;
infile datalines;
input date:$10.;
year = input(substr(date,length(date)-3,4),4.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
('’)
Check How easy to find ERROR/WARNING/UNINITIALIZED messages in the LOG window
Today I will tell you how easy it is to find ERROR/WARNING/UNINITIALIZED messages inside the LOG window.
What you need to do is….Create a list of shortcut keys inside your SAS Toolbar for the LOG window. This is quick and also saves time.
You can customize toolbar settings using the Customize tools dialog box.
You can open the Customize tools dialog box by
1) Enter TOOLEDIT (one word) in the command bar or
2) Go to, TOOLS ________ CUSTOMIZE or
3) Right click on the Toolbar and select CUSTOMIZE.
Adding a Tool to the Toolbar
What you need to do is….Create a list of shortcut keys inside your SAS Toolbar for the LOG window. This is quick and also saves time.
You can customize toolbar settings using the Customize tools dialog box.
You can open the Customize tools dialog box by
1) Enter TOOLEDIT (one word) in the command bar or
2) Go to, TOOLS ________ CUSTOMIZE or
3) Right click on the Toolbar and select CUSTOMIZE.
Adding a Tool to the Toolbar
To add a tool to the toolbar, just click the Add tool button to add a blank tool to the toolbar list. Enter a SAS command in the Command box as mentioned below. You can also add icon to the SAS command as I did below.
Adding these7 icons means you can look for ERROR/WARNING/UNINITIALIZED messages and the two arrow signs will help to find the next or previous ERROR/WARNING/UNINITIALIZED messages.
Subscribe to:
Posts (Atom)
Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!
Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...
-
1) What do you know about CDISC and its standards? CDISC stands for Clinical Data Interchange Standards Consortium and it is developed ke...
-
Comparing Two Approaches to Removing Formats and Informats in SAS Comparing Two Approaches to Removing Formats...
-
USE OF THE “STUDY DAY” VARIABLES The permissible Study Day variables (--DY, --STDY, and --ENDY) describe the relative day of the observ...