Thursday, August 21, 2008

Options in SAS' INFILE Statement

Options in SAS' INFILE Statement

There are a number of options available for the INFILE statement. Below you will find discussion of the following options: DLM='character', DSD, MISSOVER, and FIRSTOBS=value.

DLM='character'

When I prepare a data file for list input to SAS, I use a blank space as the delimiter. The delimiter is the character which must appear between the score for one variable and that for the next variable. One can, however, choose to use a delimiter other than a blank space. For example, the comma is a commonly used delimiter. If you are going to use a delimiter other than a blank space, you must tell SAS what the delimiter is.

Here is an example of a couple of data lines in a comma delimited file:

4,2,8010,2,4,2,4,4,2,2,2,2,2,2,4,4,2,4,2,2,CDFR,22,900,5,4,1
4,2,8011,1,2,3,1,3,4,4,4,1,2,2,4,2,3,4,3,1,psychology,24,360,4,3,1

Here is the INFILE statement which identified the delimiter as being a comma:
infile 'd:\Research-Misc\Hale\Hale.csv' dlm=',' dsd;

DSD

DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:

Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.

MISSOVER
I was reading a data file (from mainland China) which was very messy to read with list input, as it was not only comma delimited, but some subjects were missing data at the end of the data records, without commas marking the fields with missing data. I had to use the MISSOVER option, which prevents SAS from going to a new input line when it does not find values in the current line for some of the variables declared in the input statement. With the MISSOVER option, when SAS reaches the end of the current record, variables without any values assigned are set to missing.

Here is the INFILE statement I used:
data china; infile 'D:\Chia\beij93 data *' missover dlm=',';

FIRSTOBS=value
Sometimes the data file will have nondata on the first n lines of the file. You can use the FIRSTOBS command to tell SAS where to start reading the file. For example, consider the following infile statement:
infile 'D:\StatData\dlm.txt' dlm=',' dsd firstobs=7;

The data started on the seventh line of the file, so FIRSTOBS=7 was used to skip over the first six lines. The delimiter in the original file was a tab character, but I was unable to figure out how to use INFILE to set the delimiter to the tab character, so I used Word to replace every tab with a comma.

source: www.core.ecu.edu

1 comments:

Post a Comment

ShareThis