Sometimes it is very handy to have a macro variable contanining the variables names of the dataset. Here are the 2 different ways you can create a macro variable with list of variables names ...
*Method1: Using Proc Contents and Proc SQL;
proc contents data=sashelp.class out=class;
run;
proc sql noprint;
select distinct(name) into:vars separated by " " from class;
quit;
%put &vars;
*Method2: Using SASHELP tables and Proc SQL;
data class;
set sashelp.vcolumn(where=(libname="SASHELP" and memname="CLASS"));
keep name;
run;
proc sql noprint;
select distinct(name) into:vars separated by " " from class;
quit;
%put &vars;
Wednesday, March 24, 2010
Friday, March 12, 2010
PRXMATCH Function
Prxmatch () function is very useful in locating the matching strings. Prxmatch() function has 2 parameters, the first parameter is the regular expression ID (i.e what you are looking in a string for a match) and the second parameter is the character string to be searched. PRXMATCH () function returns the start position of the matching string.
Syntax:
PRXMATCH (perl-regular-expression, source);
Even though PRXMATCH function can be used when....
1) When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
2) If you need to search a character variable for multiple different substrings.
Here is how PRXMATCH works in the Ist case.
*Prxmatch () function is very useful in locating the matching strings;
DATA finda2z;
INPUT ID $ 1-3 string $ 5-10;
prxmatch=prxmatch("/[a-zA-Z]/",string);
DATALINES;
001 ACBED
002 11
003 12
004 zx
005 11 2c
006 abc123
;
run;
proc print;
run;
Output:
*Here PRXMATCH function will return the start position of matching string. In this case, a to z or A to Z.
If you want to find out which observation has matching string for a specified variable.
Use the following code.
prxmatch=prxmatch("/[a-zA-Z]/",string)>0;
*If match found, the value returned is 1 or else 0.
To keep those records that do not match this pattern you will look for those records where PRXMATCH returns a zero.
PRXMATCH () function is very helpful If you need to search a character variable for multiple different substrings in a variable.
Problem: Select observations where AETERM has substrings ‘ nausea’, ‘vomiting’ and ‘fever’.
The old method is to combine several INDEX function statements together with OR conditions like as …..
if index(aeterm,'nausea') > 0 or
if index(aeterm,'vomiting') > 0 or
if index(aeterm,'fever') > 0 ;
The PRXMATCH function can do this all in one statement. Less typing….few lines of code.
if prxmatch ("m/nausea|vomiting|fever/i",aeterm) > 0 ;
The 'm' option in perl-regular-expression means, PRXMATCH is going to start a matching operation.
The 'i' option tells SAS not to worry about the case, i.e, consider "NAUSEA" as same as "nausea" while searching for a match.
Another advantage of using this 'i' modifier is we can make parts of a string case sensitive and insensitive using ( i:) or (-i:).
( i:) turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search
Pipes ‘|’ should be used to separate the search strings.
Please refer PRXMATCH in the Functions section of the SAS Language Reference: Dictionary in the Online SAS Documentation for more information.
PRXMATCH special characters and it's meaning:
^ - start with
$ - end with
\D - any non digits
\d - digits
? - may or may not have?
| - or
* - repeating
( i:) - turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search
Syntax:
PRXMATCH (perl-regular-expression, source);
Even though PRXMATCH function can be used when....
1) When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
2) If you need to search a character variable for multiple different substrings.
Here is how PRXMATCH works in the Ist case.
*Prxmatch () function is very useful in locating the matching strings;
DATA finda2z;
INPUT ID $ 1-3 string $ 5-10;
prxmatch=prxmatch("/[a-zA-Z]/",string);
DATALINES;
001 ACBED
002 11
003 12
004 zx
005 11 2c
006 abc123
;
run;
proc print;
run;
Output:
*Here PRXMATCH function will return the start position of matching string. In this case, a to z or A to Z.
If you want to find out which observation has matching string for a specified variable.
Use the following code.
prxmatch=prxmatch("/[a-zA-Z]/",string)>0;
*If match found, the value returned is 1 or else 0.
To keep those records that do not match this pattern you will look for those records where PRXMATCH returns a zero.
PRXMATCH () function is very helpful If you need to search a character variable for multiple different substrings in a variable.
Problem: Select observations where AETERM has substrings ‘ nausea’, ‘vomiting’ and ‘fever’.
The old method is to combine several INDEX function statements together with OR conditions like as …..
if index(aeterm,'nausea') > 0 or
if index(aeterm,'vomiting') > 0 or
if index(aeterm,'fever') > 0 ;
The PRXMATCH function can do this all in one statement. Less typing….few lines of code.
if prxmatch ("m/nausea|vomiting|fever/i",aeterm) > 0 ;
The 'm' option in perl-regular-expression means, PRXMATCH is going to start a matching operation.
The 'i' option tells SAS not to worry about the case, i.e, consider "NAUSEA" as same as "nausea" while searching for a match.
Another advantage of using this 'i' modifier is we can make parts of a string case sensitive and insensitive using ( i:) or (-i:).
( i:) turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search
Pipes ‘|’ should be used to separate the search strings.
Please refer PRXMATCH in the Functions section of the SAS Language Reference: Dictionary in the Online SAS Documentation for more information.
PRXMATCH special characters and it's meaning:
^ - start with
$ - end with
\D - any non digits
\d - digits
? - may or may not have?
| - or
* - repeating
( i:) - turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search
Wednesday, March 10, 2010
$UPCASEw. format
We all know the importance of UPCASE function in handling the character case strings. But do you know that a format can let you do the same as the UPCASE function (upcasing the variables).
$UPCASEw. format works similar to the UPCASE Function. It also does one more thing which UPCASE function doesn’t. i.e: $UPCASEw. format let you apply length to the variable.
Remember that w specifies the width of the output field.
Example:
*********************************************************;
data new;
data new;
*convert it to uppercase;
name="studysas blog";
format name $upcase.;
newname=put(name, $upcase32.);
*Put function let you apply $upcase format;
run;
**********************************************;
*The length of the new variable newname will be 32.
Sunday, February 14, 2010
Friday, February 12, 2010
Replace Missing Numeric Values using Missing Option/Proc STDIZE
MISSING OPTION
Replacing missing values with the desired value like a zero is always a challenge, especially when we have a dataset with a number of columns to standardize. The OLD WAY of doing it to write a DATA step code with if……then statements like...
if var=. then var=0;
to make 0 appear instead of . (dot) in the tables output.
With the SAS option called missing you can save a lot of typing. If you place the following SAS option code before the generation of the table output:
Option missing="0";
SAS will display 0 (zero) instead of the . (dot) on the table. In fact it can display whatever the character we would like to display for missing values. You can use a - (line) or * (star).
Always change it back the option missing setting to default as Option Missing=''; otherwise you may endup getting unexpected results;
Warning: Since we are using this option to display zeros instead of .(dots), you shouldn't use this option before any of SAS procedures like PROC UNIVARIATE, PROC MEANS, PROC SUMMARY and PROC FREQ because they don’t treat missing as zero. So if you use Option Missing='0' we are telling SAS to consider missing as ZERO which is not what the above procedure generally do.
**********************************************************************************;
Proc STDIZE: Proc STDIZE is very useful in processing the missing values. It offers a simple and very effective solution when we want to process missing values.
Example:
data miss;
input a b c;
cards;
1 . 3
. . 2
3 2 .
1 3 4
2 . .
. 2 .
run;
proc stdize data=old reponly MISSING=0 out=new;
var _numeric_;
run;
Output:
Replacing missing values with the desired value like a zero is always a challenge, especially when we have a dataset with a number of columns to standardize. The OLD WAY of doing it to write a DATA step code with if……then statements like...
if var=. then var=0;
to make 0 appear instead of . (dot) in the tables output.
With the SAS option called missing you can save a lot of typing. If you place the following SAS option code before the generation of the table output:
Option missing="0";
SAS will display 0 (zero) instead of the . (dot) on the table. In fact it can display whatever the character we would like to display for missing values. You can use a - (line) or * (star).
Always change it back the option missing setting to default as Option Missing=''; otherwise you may endup getting unexpected results;
Warning: Since we are using this option to display zeros instead of .(dots), you shouldn't use this option before any of SAS procedures like PROC UNIVARIATE, PROC MEANS, PROC SUMMARY and PROC FREQ because they don’t treat missing as zero. So if you use Option Missing='0' we are telling SAS to consider missing as ZERO which is not what the above procedure generally do.
**********************************************************************************;
Proc STDIZE: Proc STDIZE is very useful in processing the missing values. It offers a simple and very effective solution when we want to process missing values.
MISSING= | specifies the method or a numeric value for replacing missing values |
REPLACE | replaces missing data by zero in the standardized data |
REPONLY | replaces missing data by the location measure (does not standardize the data) |
Example:
data miss;
input a b c;
cards;
1 . 3
. . 2
3 2 .
1 3 4
2 . .
. 2 .
run;
proc stdize data=old reponly MISSING=0 out=new;
var _numeric_;
run;
Output:
Thursday, February 11, 2010
COMPRESS: SAS Function strips characters from the string
In SAS 9.1.3 , an extra argument (MODIFIER) is added to the SAS character string function COMPRESS and these modifiers modifies the action of the COMPRESS function;
Syntax: COMPRESS(
Syntax: COMPRESS(
Tuesday, February 9, 2010
MMDDYY+: Format that inserts a slash, space or "-" between the date, month and year
Here is another new format which is available with SAS 9 version called MMDDYY+ . (MMDDYY+ format can insert colons /slash or periods etc between date, month and year values); This new date format developed by SAS works for many other formats, for ex:
MMDDYYB.; *format inserts Blanks in between date, month and years;
MMDDYYC.; *format inserts Colons in between date, month and years;
MMDDYYD.; *format inserts Dashes in between date, month and years;
MMDDYYN.;*format inserts Nothing in between date,month & years;
MMDDYYP.; *format inserts Periods in between date, month & years;
MMDDYYS.; *format inserts Slash in between date, month and years;
When you have date, month and year as separate variables and you want to insert colons or slash or periods and even blanks between them…. MMDDYY+ is your answer.
Here I will show how?
data have;
input date month year;
cards;
25 1 2005
19 3 2006
12 2 2004
1 12 2007
;
run;
data want;
set have;
format bdate MMDDYYB.; *format inserts Blanks.'B' means Blank;
format cdate MMDDYYC.; *format inserts Colons. 'C' means Colon;
format ddate MMDDYYD.; *format inserts Dashes. 'D' means Dash;
format ndate MMDDYYN.; *format inserts Nothing.'N' means Nothing;
format pdate MMDDYYP.; *format inserts Periods. 'P' means Period;
format sdate MMDDYYS.; *format inserts Slash. 'S' means SLASH ;
bdate = mdy(month,"01",year);
cdate = mdy(month,"01",year);
ddate = mdy(month,"01",year);
ndate = mdy(month,"01",year);
pdate = mdy(month,"01",year);
sdate = mdy(month,"01",year);
run;
Output:
MMDDYYB.; *format inserts Blanks in between date, month and years;
MMDDYYC.; *format inserts Colons in between date, month and years;
MMDDYYD.; *format inserts Dashes in between date, month and years;
MMDDYYN.;*format inserts Nothing in between date,month & years;
MMDDYYP.; *format inserts Periods in between date, month & years;
MMDDYYS.; *format inserts Slash in between date, month and years;
When you have date, month and year as separate variables and you want to insert colons or slash or periods and even blanks between them…. MMDDYY+ is your answer.
Here I will show how?
data have;
input date month year;
cards;
25 1 2005
19 3 2006
12 2 2004
1 12 2007
;
run;
data want;
set have;
format bdate MMDDYYB.; *format inserts Blanks.'B' means Blank;
format cdate MMDDYYC.; *format inserts Colons. 'C' means Colon;
format ddate MMDDYYD.; *format inserts Dashes. 'D' means Dash;
format ndate MMDDYYN.; *format inserts Nothing.'N' means Nothing;
format pdate MMDDYYP.; *format inserts Periods. 'P' means Period;
format sdate MMDDYYS.; *format inserts Slash. 'S' means SLASH ;
bdate = mdy(month,"01",year);
cdate = mdy(month,"01",year);
ddate = mdy(month,"01",year);
ndate = mdy(month,"01",year);
pdate = mdy(month,"01",year);
sdate = mdy(month,"01",year);
run;
Output:
No more left(trim) or CAT/CATX/CATT/CATS functions..........
Subscribe to:
Posts (Atom)