Thursday, August 21, 2008

HOW TO USE THE SCAN FUNCTION:

USING THE SCAN FUNCTION:

SCAN(string,n,delimiters): returns the nth word from the character string string, where words are delimited by the characters in delimiters. 

It is used to extract words from a  character value when the relative order of words is known, but their starting positions are not.

NewVar=SCAN(string,n<,delimiters>); -returns the nth ‘word’ in the string
 
When the SCAN function is used:
􀂃 the length of the created variable is 200 bytes if it is not previously defined with a LENGTH statement
􀂃 delimiters before the first word have no effect When the SCAN function is used,
􀂃 any character or set of characters can serve as delimiters

Points to remember while using SCAN Function:
􀂃 a missing value is returned if there are fewer than n words in string
􀂃 two or more contiguous delimiters are treated as a single delimiter
􀂃 if  n is negative, SCAN selects the word in the character string starting from the end of string.

􀂃 If you omit delimiters , default is blank . < ( + & ! $ * ) ; ^ - / , %
Source: http://www.biostat.jhsph.edu/bstcourse/bio632/SummerInst/Class2/class2.pdf

Example of use:

Name= 'StudySAS, Blog';
Last= scan(Name,2,',');  *Note : Comma ',' is Delimiter here;

Results in Last returns to Blog

Using the SCAN function from SAS Functions by Example:

Suppose you want to produce an alphabetical list by last name, but your NAME variable contains FIRST, possibly a middle initial, and LAST name. The SCAN function makes quick work of this. Note that the LAST_NAME variable in PROC REPORT has the attribute of ORDER and NOPRINT, so that the list is in alphabetical order of last name but all that shows up is the original NAME variable in First, Middle, and Last name order.

DATA FIRST_LAST;
INPUT @1 NAME $20.@21 PHONE $13.;

***Extract the last name from NAME;


LAST_NAME = SCAN(NAME,-1,' '); /* Scans from the right */
DATALINES;
Jeff W. Snoker (908)782-4382
Raymond Albert (732)235-4444
Steven J. Foster (201)567-9876
Jose Romerez (516)593-2377
;

PROC REPORT DATA=FIRST_LAST NOWD;
TITLE "Names and Phone Numbers in Alphabetical Order (by Last Name)";
COLUMNS NAME PHONE LAST_NAME;
DEFINE LAST_NAME / ORDER NOPRINT WIDTH=20;
DEFINE NAME / DISPLAY 'Name' LEFT WIDTH=20;
DEFINE PHONE / DISPLAY 'Phone Number' WIDTH=13 FORMAT=$13.;
RUN;

source: www.support.sas.com

How to use the PROPCASE function

Using the PROPCASE function

The "old" way to capitalize the first letter of words was to use LOWCASE, UPCASE, and the SUBSTR function, like this:

DATA CAPITALIZE;
INFORMAT FIRST LAST $30.;
INPUT FIRST LAST;
FIRST = LOWCASE(FIRST);
LAST = LOWCASE(LAST);


SUBSTR(FIRST,1,1) = UPCASE(SUBSTR(FIRST,1,1));
SUBSTR(LAST,1,1) = UPCASE(SUBSTR(LAST,1,1));

DATALINES;
ronald cODy
THomaS eDISON
albert einstein
;
PROC PRINT DATA=CAPITALIZE NOOBS;
TITLE "Listing of Data Set CAPITALIZE";
RUN;

With the PROPCASE function in SAS 9.1, it's much easier.

DATA PROPER;
INPUT NAME $60.;
NAME = PROPCASE(NAME);
DATALINES;
ronald cODy
THomaS eDISON
albert einstein

;
PROC PRINT DATA=PROPER NOOBS;
TITLE "Listing of Data Set PROPER";
RUN;

source:www.support.sas.com


Example:

data test;
x=lowcase('MY NaMe iS SARaTh');
y=propcase(x);
z=propcase(lowcase('I AM A PROGRAMMER'));
a=lowcase(z);
b=upcase(x);
run;

How to capitalize the first letter of every word in a string



Capitalize the first letter of every word in a string

Convert a text string into mixed case.
Note: Beginning in SAS 9.1, this task is easily accomplished with the PROPCASE function. See Sample 2 on the Full Code tab.

/* Sample 1: COMPBL, LOWCASE, SCAN, INDEX, UPCASE, SUBSTR */

data sample;
input name $char50.;
/* Lowercase the entire string, remove consecutive blanks */
newname=compbl(lowcase(name));
length next $ 20;
i=0;
next=scan(newname,1,' ');
do while(next ne ' ');
i+1;

/* Scan off each 'word' based upon a space, locate the position */
/* of the first letter in the original string, UPCASE the first */
/* letter and use SUBSTR to replace the byte. */


pos=indexw(newname,trim(next));
substr(newname,pos,1)=upcase(substr(newname,pos,1));
next=scan(newname,i,' ');
end;
keep name newname;
datalines;
Jane DOE
min ning chou
HENRIK HANSSON
D ETCHEVERRY, Charo B
;

proc print;
run;


/* Sample 2: PROPCASE (available in SAS 9.1) */
data sample;
input name $char50.;
mixed=propcase(name," ',");
datalines;
tom o'shea
min ning chou
LENA HANSSON
murder, she wrote
;
proc print;
run;


OUTPUT:


source:support.sas.com

SOUNDEX function

Combine data sets based upon similar values

Encode character strings using SOUNDEX to aid in combining the data based upon similar but not exact values.

Note: The SOUNDEX algorithm is English-biased. For more details about SOUNDEX, please refer to the SAS Language Reference, Dictionary under Functions.













RESULT:


source: support.sas.com

Options in SAS' INFILE Statement

Options in SAS' INFILE Statement

There are a number of options available for the INFILE statement. Below you will find discussion of the following options: DLM='character', DSD, MISSOVER, and FIRSTOBS=value.

DLM='character'

When I prepare a data file for list input to SAS, I use a blank space as the delimiter. The delimiter is the character which must appear between the score for one variable and that for the next variable. One can, however, choose to use a delimiter other than a blank space. For example, the comma is a commonly used delimiter. If you are going to use a delimiter other than a blank space, you must tell SAS what the delimiter is.

Here is an example of a couple of data lines in a comma delimited file:

4,2,8010,2,4,2,4,4,2,2,2,2,2,2,4,4,2,4,2,2,CDFR,22,900,5,4,1
4,2,8011,1,2,3,1,3,4,4,4,1,2,2,4,2,3,4,3,1,psychology,24,360,4,3,1

Here is the INFILE statement which identified the delimiter as being a comma:
infile 'd:\Research-Misc\Hale\Hale.csv' dlm=',' dsd;

DSD

DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:

Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.

MISSOVER
I was reading a data file (from mainland China) which was very messy to read with list input, as it was not only comma delimited, but some subjects were missing data at the end of the data records, without commas marking the fields with missing data. I had to use the MISSOVER option, which prevents SAS from going to a new input line when it does not find values in the current line for some of the variables declared in the input statement. With the MISSOVER option, when SAS reaches the end of the current record, variables without any values assigned are set to missing.

Here is the INFILE statement I used:
data china; infile 'D:\Chia\beij93 data *' missover dlm=',';

FIRSTOBS=value
Sometimes the data file will have nondata on the first n lines of the file. You can use the FIRSTOBS command to tell SAS where to start reading the file. For example, consider the following infile statement:
infile 'D:\StatData\dlm.txt' dlm=',' dsd firstobs=7;

The data started on the seventh line of the file, so FIRSTOBS=7 was used to skip over the first six lines. The delimiter in the original file was a tab character, but I was unable to figure out how to use INFILE to set the delimiter to the tab character, so I used Word to replace every tab with a comma.

source: www.core.ecu.edu

Finding the number of observations in SAS dataset

There are a number of ways of finding out the number of observations in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter a pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname="WORK" and memname="TEST";
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx("nobs",nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option and you don’t need a SAS/AF licence to do it either.

%LET DSID=%SYSFUNC(OPEN(WORK.TEST,IN));
%LET NOBS=%SYSFUNC(ATTRN(&DSID,NOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

The first line opens the data set and the last one closes it; this is needed because you are not using data step or SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

source: technologytales.com

Another Simple way of doing this is to take help from PROC SQL automatic macro variable SQLOBS. Proc SQL automatically creates SALOBS macro variable, when it runs a program. SQLOBS macro variable will have the number of observations count of last proc SQL statement.

Here is how...

The following code will find the number of observations in the dataset AE in the Library name called SAS.

Note: Don't use noprint option in the Proc SQL statement.

libname sas 'C:\Users\Sarath Annapareddy\Desktop\*******;

proc sql;
select * from sas.ae;
quit;
%put SQLOBS=&sqlobs;



9 %put SQLOBS=&sqlobs;
SQLOBS=224



You can also get the number of observations value using Proc Contents. Here is how…

proc contents data=work.dataset out=nobs(where=(varnum=1)keep=memname nobs varnum)noprint;
run;

Convert values from character to numeric or from numeric to character.

Convert values from character to numeric or from numeric to character\Convert variable values using either the INPUT or PUT function.
Convert a character value to a numeric value by using the INPUT function. Specify a numeric informat that best describes how to Read the data value into the numeric variable. When changing types a new variable name is required. If you need to keep the original variable name for the new type, use the RENAME= option as illustrated in Sample 2.


data char;
input string :$8. date :$6.;
numeric=input(string,8.);
sasdate=input(date,mmddyy6.);
format sasdate mmddyy10.;
datalines;
1234.56 031704
3920 123104;


proc print;
run;








data now_num;
input num date: mmddyy6.;
datalines;
123456 110204
1000 120504;
 

run;




data now_char;
set now_num (rename=(num=oldnum date=olddate));
num=put(oldnum,6. -L);
date=put(olddate,date9.);
run;

proc print;
run;



Source: support.sas.com


Here is how to convert Numeric date with  yymmdd8. format to mmddyy10. format.

data datec;

input numdate;
date =input(put(numdate,z8.),yymmdd8.);
format date mmddyy10.;
datalines;
20040625
20081219
;
run;