Monday, April 26, 2010

WARNING: You may have unbalanced quotation marks.

SAS can allow the strings up to 32,767 characters long but some times SAS will write a Warning message WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks., when you try to keep a character string longer than 262 characters to a variable.  It is hard to look back at the SAS code to search for unbalanced quotes.
To make it more clearly I am going to show an example.

I want to add a 263 characters long name to a variable (longvar) and to do that I will simply use a data step… and when I do that I will see the WARNING message in Log.

data TEST;
x="(SEE DOCTOR'S LETTER)3RD ADMINISTRATION OF MTX WAS DELAYED BY 14 DAYS AND WAS REDUCED TO 1G/M2 INSTEAD OF 5G/M2, PROBLEMS, E.COLI SEPSIS WITH HEART INSUFFICIENCY WITH SINUS TACHYCARDY, PARALYTIC ILEUS, TACHYPNEA , PATIENT DIED ON 21.04.98 FROM MULTIORGAN FAILURE.";
y=length(x);
put x;
run;

LOG FILE:

There is a SAS option (NOQUOTELENMAX) which will take care of the WARNING message.

Options noQuoteLenMax; *Just before the Data step;

Don’t forget to change back to Options QuoteLenMax; after the end of the Data step.

Options noQuoteLenMax;

Saturday, April 17, 2010

CALL EXECUTE: Easy way to print or sort multiple files.

When printing multiple files, or sorting multiple datasets, the traditional method is to write multiple steps as below.

Proc print data=libref.ae; var _all_; run;
Proc print data=libref.conmed; var _all_; run;
Proc print data=libref.demog; var _all_; run;
Proc print data=libref.lab; var _all_; run;

Proc print data=libref.medhist; var _all_; run;
If you are like me who likes to simplify the traditional SAS code here is the tip. CALL EXECUTE comes to rescue here.

*Using Disctionary Tables and Call Execute;
proc sql;
create table dsn as select distinct memname from dictionary.tables
where libname="LIBREF" and memtype="DATA";
quit;

*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=final.||'memname||';by usubjid; run;");
run;

*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=final.||'trim(memname)||';var _all_; run;");
run;

*Using Proc Contents and Call Execute;
proc contents data=libref._all_ out=contents(keep=memname);
run;

*Create a macro variable memname with list of all the datasets; 
proc sql;
select distinct memname into:memname from contents;
quit;
%put memname;

*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=libref.||'trim(memname)||';by usubjid; run;");
run;

*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=libref.||'trim(memname)||';var _all_; run;");
run;

*Using SASHELP Views and Call Execute to sort the dataset by usubjid;

*Sorts all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc sort data=libref.||'trim(memname)||';by usubjid;run;");
run;

*Prints all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc print data=libref.||'trim(memname)||';by _all_;run;");
run;

*If you are not printing/sorting all the datasets in the library here is code for that.The Following code only prints 4 datasets (AE, Conmed, Demog , Lab and Medhist);

data _null_;
do dsname='ae', 'conmed', 'demog', 'lab', 'medhist';
call execute("Proc print data=libref.||'trim(dsname))||'; var _all_; run;");
end;
run;

Saturday, April 10, 2010

Write a Letter using SAS/ Emailing with SAS

SAS can do many things which most of us don’t have a clue. Here is one example….

Writing a letter:

filename formltr 'C:\Documents and Settings\sreddy\Desktop\formltr.rtf';

data address;
infile datalines;
input @ 1 stno
 @ 6 lane $12.
@19 aptno $7.
@27 city $9.
@37 state $2.
@40 zip ;
datalines;
2550 Augusta Blvd Apt#203 Fairfield OH 45014
;
run;

data _null_;
retain lm 5;
set address;
file formltr;* print notitles;
put _page_;
adr1 = trim(stno) ' ' trim(lane);
put @lm adr1;
adr2 = trim(aptno);
put @lm adr2;
adr3 = trim(city) ||', '|| trim(state) ||' '|| trim(zip);
put @lm adr3;
adr4 = trim('Dear')|| ' ' ||trim('SAS') || ' ' || trim('Users,');
put / @lm adr4;
put / @lm 'StudySAS Blog offers a lot of information regarding tips and tutorials on various topics ' ;
put @lm 'in SAS. It covers basics to get started to more in-depth topics like Macros and Proc SQL.';
put @lm 'It is a great site to browse to help broaden and deepen your SAS knowledge in a variety';
put @lm 'of areas.';
put / @lm 'Thanks for visiting StudySAS Blog. ';
put //// @lm 'Sarath Annapareddy';
run;
  
  • lm: represents left margin
  • / : forward slash symbol ( / ) skips a line.
  • If you want to skip ‘N’ number of lines use ‘N’ number of flashes after the PUT statement.
  • The trim function and concatenation operator (||) are important here because without these you will get extra spaces which we probably don't want see in our letter.
The Above SAS program will create a rtf in the specified location with the following information.



Emailing with SAS




How to send an email using SAS:

filename mymail email sastest@abc.com subject="Sending Email using SAS" from=abctest@gmail.com' attach="C:\Documents and Settings\sreddy\Desktop\formltr.rtf";


data _null_;
file mymail;
put 'Hello there, Please review the attached letter.';
put 'Thanks,';
put 'Sarath';
run;
quit;

Saturday, April 3, 2010

Special Missing Values in SAS

Definition: Special missing value is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore.
Ref: SAS 9.1.3 language reference: concepts page no: 102

The symbol usually used to represent a missing value for a numerical variable is the period or dot. Aside from the dot, there are 27 special missing values SAS can store in numerical variables. They are the dot-underscore (._), and dot-letter (.A thru .Z). Note that these special values are case insensitive. That is, .A=.a .B=.b .C=.c etc.

If you do not begin a special numeric missing value with a period, SAS identifies it as a variable name. Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example:

x=.d;

When SAS prints a special missing value, it prints only the letter. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.

Example:  Consider the following data step which contains a questionnaire data (three students, three questions, and three possible responses to each question 1, 2 and 3):

data test;
/* M = multiple, U = unreadable, .=Didn’t answer */
missing answer M U;
input student question answer;
datalines;
1 1 1
1 2 2
1 3 M
2 1 U
2 2 3
2 3 2
3 1 M
3 2 .
3 3 1
;
Proc print data=test; run;
The MISSING statement is needed here to keep special missing values for the numeric variable answer. In the above example, M is used to indicate multiple responses (not allowed) and U is used to indicate an unreadable response.

Order of Missing Values for Numeric Variables:

The numeric missing value (.) is sorted before the special numeric missing value .A, and both are sorted before the special missing value .Z. SAS does not distinguish between lowercase and uppercase letters when sorting special numeric missing values.

Checking for Missing Numeric Values:

Often the SAS programmer uses the following SAS code to check for a missing numeric value:

IF VALUE=. THEN PUT "*** Value is missing";

While in most instances the above code works as intended, there are occasions where it may not catch some missing values. The above statement assumes that only a dot is present, and none of the other 27 missing numeric values, are present in your data. In exhibit 1, it was shown that the dot-Z is the highest missing value. So, a better, more inclusive way to check for a missing numeric values is:

IF VALUE <=.Z THEN PUT "*** Value is missing";
Reference: http://analytics.ncsu.edu/sesug/2005/TU06_05.PDF

The latter IF statement checks for all 28 possible missing values.

For more details on Special Missing Values Please refer, Malachy J. Foley paper … MISSING VALUES: Everything You Ever Wanted to Know

The other thing you should know is... If the MISSING option is used in PROC FREQ, you'll get a breakdown for each type of missing value. For example, given (without MISSING):


*Without MISSING option:
proc freq data=test;
tables question*answer/ nopercent nocol norow;
run;

Output:

*With MISSING option:


proc freq data=test;
tables question*answer/ nopercent nocol norow missing;
run;

output;




Wednesday, March 24, 2010

How to create a macro variable containing a list of variables in a DATA set

Sometimes it is very handy to have a macro variable contanining the variables names of the dataset. Here are the 2 different ways you can create a macro variable with list of variables names ...

*Method1: Using Proc Contents and Proc SQL;



proc contents data=sashelp.class out=class;
run;

proc sql noprint;
select distinct(name) into:vars separated by " " from class;
quit;


%put &vars;


*Method2: Using SASHELP tables and Proc SQL;


data class;
set sashelp.vcolumn(where=(libname="SASHELP" and memname="CLASS"));
keep name;
run;


proc sql noprint;
select distinct(name) into:vars separated by " " from class;
quit;

%put &vars;

Friday, March 12, 2010

PRXMATCH Function

Prxmatch () function is very useful in locating the matching strings. Prxmatch() function has 2 parameters, the first parameter is the regular expression ID (i.e what you are looking in a string for a match) and the second parameter is the character string to be searched. PRXMATCH () function returns the start position of the matching string.
Syntax:

PRXMATCH (perl-regular-expression, source);

Even though PRXMATCH function can be used when....
1) When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
2) If you need to search a character variable for multiple different substrings.

Here is how PRXMATCH works in the Ist case.

*Prxmatch () function is very useful in locating the matching strings;

DATA finda2z;
INPUT ID $ 1-3 string $ 5-10;
prxmatch=prxmatch("/[a-zA-Z]/",string);
DATALINES;
001 ACBED
002 11
003 12
004 zx
005 11 2c
006 abc123
;
run;


proc print;
run;

Output:


*Here PRXMATCH function will return the start position of matching string. In this case, a to z or A to Z.

If you want to find out which observation has matching string for a specified variable.

Use the following code.

prxmatch=prxmatch("/[a-zA-Z]/",string)>0;

*If match found, the value returned is 1 or else 0.



To keep those records that do not match this pattern you will look for those records where PRXMATCH returns a zero.

PRXMATCH () function is very helpful If you need to search a character variable for multiple different substrings in a variable.

Problem: Select observations where AETERM has substrings ‘ nausea’, ‘vomiting’ and ‘fever’.

The old method is to combine several INDEX function statements together with OR conditions like as …..

if index(aeterm,'nausea') > 0 or
if index(aeterm,'vomiting') > 0 or
if index(aeterm,'fever') > 0 ;

The PRXMATCH function can do this all in one statement. Less typing….few lines of code.

if prxmatch ("m/nausea|vomiting|fever/i",aeterm) > 0 ;

The 'm' option in perl-regular-expression means, PRXMATCH is going to start a matching operation.
The 'i' option tells SAS not to worry about the case, i.e, consider  "NAUSEA" as same as "nausea" while searching for a match.
Another advantage of using this 'i' modifier is we can make parts of a string case sensitive and insensitive using  ( i:)  or (-i:).
( i:)  turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search

Pipes ‘|’ should be used to separate the search strings.

Please refer PRXMATCH in the Functions section of the SAS Language Reference: Dictionary in the Online SAS Documentation for more information.

PRXMATCH special characters and it's meaning:
^ - start with

$ - end with
\D - any non digits
\d - digits
? - may or may not have?
| - or
* - repeating
( i:) - turns ON the case insensitive search

(-i:) - turn OFF the case insensitive search




Wednesday, March 10, 2010

$UPCASEw. format

We all know the importance of UPCASE function in handling the character case strings. But do you know that a format can let you do the same as the UPCASE function (upcasing the variables).
$UPCASEw. format works similar to the UPCASE Function. It also does one more thing which UPCASE function doesn’t. i.e: $UPCASEw. format let you apply length to the variable.

Remember that w specifies the width of the output field.

Example:
*********************************************************;
data new;
*convert it to uppercase;
name="studysas blog";
format name $upcase.;
newname=put(name, $upcase32.);
*Put function let you apply $upcase format;
run;
**********************************************;

*The length of the new variable newname will be 32.