Saturday, April 3, 2010

Special Missing Values in SAS

Definition: Special missing value is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore.
Ref: SAS 9.1.3 language reference: concepts page no: 102

The symbol usually used to represent a missing value for a numerical variable is the period or dot. Aside from the dot, there are 27 special missing values SAS can store in numerical variables. They are the dot-underscore (._), and dot-letter (.A thru .Z). Note that these special values are case insensitive. That is, .A=.a .B=.b .C=.c etc.

If you do not begin a special numeric missing value with a period, SAS identifies it as a variable name. Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example:

x=.d;

When SAS prints a special missing value, it prints only the letter. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.

Example:  Consider the following data step which contains a questionnaire data (three students, three questions, and three possible responses to each question 1, 2 and 3):

data test;
/* M = multiple, U = unreadable, .=Didn’t answer */
missing answer M U;
input student question answer;
datalines;
1 1 1
1 2 2
1 3 M
2 1 U
2 2 3
2 3 2
3 1 M
3 2 .
3 3 1
;
Proc print data=test; run;
The MISSING statement is needed here to keep special missing values for the numeric variable answer. In the above example, M is used to indicate multiple responses (not allowed) and U is used to indicate an unreadable response.

Order of Missing Values for Numeric Variables:

The numeric missing value (.) is sorted before the special numeric missing value .A, and both are sorted before the special missing value .Z. SAS does not distinguish between lowercase and uppercase letters when sorting special numeric missing values.

Checking for Missing Numeric Values:

Often the SAS programmer uses the following SAS code to check for a missing numeric value:

IF VALUE=. THEN PUT "*** Value is missing";

While in most instances the above code works as intended, there are occasions where it may not catch some missing values. The above statement assumes that only a dot is present, and none of the other 27 missing numeric values, are present in your data. In exhibit 1, it was shown that the dot-Z is the highest missing value. So, a better, more inclusive way to check for a missing numeric values is:

IF VALUE <=.Z THEN PUT "*** Value is missing";
Reference: http://analytics.ncsu.edu/sesug/2005/TU06_05.PDF

The latter IF statement checks for all 28 possible missing values.

For more details on Special Missing Values Please refer, Malachy J. Foley paper … MISSING VALUES: Everything You Ever Wanted to Know

The other thing you should know is... If the MISSING option is used in PROC FREQ, you'll get a breakdown for each type of missing value. For example, given (without MISSING):


*Without MISSING option:
proc freq data=test;
tables question*answer/ nopercent nocol norow;
run;

Output:

*With MISSING option:


proc freq data=test;
tables question*answer/ nopercent nocol norow missing;
run;

output;




2 comments:

FunTimeKiller said...

Your blog is a magnificent resource to get valuable info! Will you be mind if I make a trackback of some of your posts on my private blog?

Unknown said...

Great job...Awesome Blog...keep up the good work..

Post a Comment

ShareThis