/* Sample data set */
data missing;
input n1 n2 n3 n4 n5 n6 n7 n8 c1 $ c2 $ c3 $ c4 $;
datalines;
1 . 1 . 1 . 1 4 a . c .
1 1 . . 2 . . 5 e . g h
1 . 1 . 3 . . 6 . . k i
1 . . . . . . . . . . .
1 . . . . . . . c . . .
. . . . . . . . . . . .
;
run;
*If you want to delete observation if the data for every variable is missing then use the following code;
*Approach 1: Using the coalesce option inside the datastep;
data drop_misobs;
set missing;
if missing(coalesce(of _numeric_)) and missing(coalesce(of _character_)) then delete;
run;
Pros:
*Simple code
Cons;
*This code doesn't work if we want to delete observation based on specific variables and not all of them.
*Approach 2:Using N/NMISS option inside the datastep;
data drop_missing;
set missing;
*Checks the Non missing values using ;
if n(n1, n2, n3, n4, n5, n6, n7, n8, c1, c2, c3, c4)=0 then delete;
run;
data drop_missing;
set missing;
*Checks the missing values using nmiss option;
if nmiss(n1, n2, n3, n4, n5, n6, n7, n8, c1, c2, c3, c4)=12 then delete; *12 is the total number of variables in the dataset missing.;
run;
*If you want to delete records based on few variables and don't want to type all the variable names in the IF-THEN clause use the following code;
*Task: Delete observations from the dataset if all variables in the dataset except (N1 and C1) has missing data;
proc contents data=missing out=contents(keep=memname name);
run;
*Create a macro variable names with list of variable names in the dataset;
proc sql;
select distinct name into:names separated by ','
from contents(where=(upcase(name) ^in ('N1','C1'))) where memname='MISSING'; *Excluding 2 variables in the dataset;
quit;
data remove_missing;
set missing;
if n(&names) lt 1 then delete;
run;
('’)
Monday, July 19, 2010
Saturday, July 10, 2010
Diffrence Between RUN and QUIT statements
Folkes, Here is the answer from Andrew Karp..... Direct link
****************************************************************************************************;
Allow me to weigh in on this topic. It comes up alot when I give SAS training classes. First, RUN and QUIT are both "explicit step boundaries" in the SAS Programming Language. PROC and DATA are "implied step boundaries."
Example 1: Two explicit step boundaries.
DATA NEW;
SET OLD:
C = A + B;
RUN;
PROC PRINT DATA=NEW;
RUN;
In this example, both the data and the proc steps are explicitly "ended" by their respective RUN statements.
Example 2: No explicit step boundaries.
DATA NEW;
SET OLD;
C = A + B;
PROC PRINT DATA=NEW;
In this example, the data step is implicitly terminated by the PROC statement. But, there is no step boundary for the PROC PRINT step/task, so it will not terminate unless/until the SAS supervisor "receives" a step boundary.
Some PROCS support what is called RUN group processing. These include PROCs DATASETS and PLOT in the BASE Module, PROCs REG and GLM in the STAT module and ARIMA in the ETS module.
Example 3: PROC DATASETS with RUN group processing.
PROC DATASETS LIBRARY = MYLIB;
MODIFY SET1;
FORMAT VAR1 DOLLAR12.2;
LABEL VAR1 "Dollars Paid";
RUN;
DELETE SET2;
RUN;
CHANGE SET3 = SET2;
RUN;
QUIT;
In this example, three separate data mangement tasks are carried out in the order they were written/coded. First a label and format are added to the descriptor portion of SET1, then SET 2 is deleted, and then SET3 is renamed to SET2. The RUN ends each command in the PROC DATASETS step (MODIFY, DELETE, CHANGE) and QUIT ends the step. If the explicit step boundary had been omitted, the step would have been implicity terminated by a subsequent PROC or DATA statement. If there were no implied step boundary following the last RUN command then the PROC DATASETS step would not terminate.
The same holds true with other RUN-group enabled PROCs in SAS Software. The ARIMA procedure in the ETS module, for example, implements what is called the Box-Jenkins methodology to analyze a time series and then generate future forecasted values from the existing series. There are three parts to this methodology, which are implmented in PROC ARIMA using (in this order), the IDENTIFY, ESTIMATE and FORECAST statements. The output from each statement is needed by PROC ARIMA to move to the next step in the process, and an experienced forecaster can look at the output generated by the IDENTIFY statement and then write the appropriate ESTIMATE statement syntax, and then do the same thing with the output generated by the ESTIMATE statement to write the proper FORECAST statement syntax. Once the analyst is satisfied with their model, they can terminate the PROC ARIMA step with a QUIT statement and move on to the next part of their project.
I hope this has been helpful.
Andrew Karp
Sierra Information Services
*******************************************************************************************************;
Tuesday, May 18, 2010
Sending the LOG and OUTPUT from PC SAS to a seperate file
Here is how to direct the SAS LOG file and or SAS Output to a seperate file.
Approach 1: Using Display Manager Statements;
filename log 'C:\temp\logfile.log';
filename out 'C:\temp\output.lst';
*Select only male students and age less than 16;
proc sql;
create table males as
select age, height, weight
from sashelp.class
where sex='M' and age lt 16
order by age;
quit;
*Get the descriptive statistics for height variable by age;
proc means data=males ;
by age;
var height;
output out=htstats mean=mean n=n std=sd median=med min=min max=max;
run;
DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;';
Information about Display Manager Commands:
DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files;
SAS Display Manager Commands
Approach 2: Using Proc PRINTTO procedure;
Refer: How to save the log file or what is PROC PRINTTO procedure
('’)
Approach 1: Using Display Manager Statements;
filename log 'C:\temp\logfile.log';
filename out 'C:\temp\output.lst';
*Select only male students and age less than 16;
proc sql;
create table males as
select age, height, weight
from sashelp.class
where sex='M' and age lt 16
order by age;
quit;
*Get the descriptive statistics for height variable by age;
proc means data=males ;
by age;
var height;
output out=htstats mean=mean n=n std=sd median=med min=min max=max;
run;
DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;';
Information about Display Manager Commands:
DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files;
SAS Display Manager Commands
Approach 2: Using Proc PRINTTO procedure;
Refer: How to save the log file or what is PROC PRINTTO procedure
('’)
Sunday, May 9, 2010
Random Sample Selection
Last week my manager asked me to randomly pick 10%observations from a large data set and then create a listing so that the Data management programmers can QC the data. I want to share some thoughts here … how easy and simple to do random sampling.
Approach 1:
Data step Approach: In this approach, the observations are shuffled using the RANUNI function which assigns a random number to each observation.
Step1: Generating the Random Vector (shuffling) using the RANUNI function;
The RANUNI function generates a random number from a continuous uniform distribution (the interval (0, 1).
Step2: After assigning a random number to each record, the records can then be sorted in ascending or descending order of the random numbers.;
Proc SQL Approach:
In this approach also, the observations are shuffled using the RANUNI function which assigns a random number to each observation.
*Proc Surveyselect Approach;
Important Note:
The value generated by the RANUNI function depends on a seed. The seed should be a non-negative integer from 1 to 2,147,483,646 in order to replicate the results of the RANUNI function. That is, given the same seed, the function produces the same result. If no seed, zero, or negative integers are specified as the seed, the computer clock sets the seed and results are not replicable.
Source: SESUG-2000 (P-404).pdf on Random Sample Selection by Imelda C. Go, Richland County School District One, Columbia, SC
If you use a positive number as a seed then you can replicate the random sample records as long as you don’t change the seed number. If you use the negative number (as in the above programs) you can’t replicate the records. Every time you submit the program random samples generated will be different.
('’)
Approach 1:
Data step Approach: In this approach, the observations are shuffled using the RANUNI function which assigns a random number to each observation.
Step1: Generating the Random Vector (shuffling) using the RANUNI function;
The RANUNI function generates a random number from a continuous uniform distribution (the interval (0, 1).
Step2: After assigning a random number to each record, the records can then be sorted in ascending or descending order of the random numbers.;
data randsamp ;
input patno @@;
random=RANUNI(-1);
* RANUNI function to assign a random number to each record.;
* Here the seed is negative integer (-1) so the results are not replicable.;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;
*Sort the records by increasing shuffling order;
proc sort data=randsamp; by random; run;
*CREATE MACRO VARIABLE FOR 9 (10% of 83 Subjects) RANDOM PATIENTS LIST;
proc sql;
select distinct(patno) into:random separated by "," from randsamp where monotonic() le 9;
quit;
%put &random;
Proc SQL Approach:
In this approach also, the observations are shuffled using the RANUNI function which assigns a random number to each observation.
*RANDOMLY SELECTING 10% OF A LARGE DATASET using Proc SQl and RANUNI function.;
*The following Proc SQL code will create a table called rands consisting of approximately 10% of the records randomly selected from dataset randsamp;
DATA randsamp ;
input patno @@;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;
proc sql;
create table rands(where= (monotonic() eq 9)) as
select *, RANUNI(-1) as random from randsamp order by random ;
quit;
*The following Proc Survey select code will create a table called sample consisting of approximately 10% of the records randomly selected from dataset randsamp;
DATA randsamp;
input patno @@;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;
PROC SURVEYSELECT DATA=randsamp2 OUT=sample SAMPSIZE=9 SEED=-1;
RUN;
The value generated by the RANUNI function depends on a seed. The seed should be a non-negative integer from 1 to 2,147,483,646 in order to replicate the results of the RANUNI function. That is, given the same seed, the function produces the same result. If no seed, zero, or negative integers are specified as the seed, the computer clock sets the seed and results are not replicable.
Source: SESUG-2000 (P-404).pdf on Random Sample Selection by Imelda C. Go, Richland County School District One, Columbia, SC
If you use a positive number as a seed then you can replicate the random sample records as long as you don’t change the seed number. If you use the negative number (as in the above programs) you can’t replicate the records. Every time you submit the program random samples generated will be different.
('’)
Monday, April 26, 2010
WARNING: You may have unbalanced quotation marks.
SAS can allow the strings up to 32,767 characters long but some times SAS will write a Warning message ‘WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks.’, when you try to keep a character string longer than 262 characters to a variable. It is hard to look back at the SAS code to search for unbalanced quotes.
To make it more clearly I am going to show an example.
I want to add a 263 characters long name to a variable (longvar) and to do that I will simply use a data step… and when I do that I will see the WARNING message in Log.
data TEST;
x="(SEE DOCTOR'S LETTER)3RD ADMINISTRATION OF MTX WAS DELAYED BY 14 DAYS AND WAS REDUCED TO 1G/M2 INSTEAD OF 5G/M2, PROBLEMS, E.COLI SEPSIS WITH HEART INSUFFICIENCY WITH SINUS TACHYCARDY, PARALYTIC ILEUS, TACHYPNEA , PATIENT DIED ON 21.04.98 FROM MULTIORGAN FAILURE.";
y=length(x);
put x;
run;
LOG FILE:
There is a SAS option (NOQUOTELENMAX) which will take care of the WARNING message.
Options noQuoteLenMax; *Just before the Data step;
Don’t forget to change back to Options QuoteLenMax; after the end of the Data step.
Options noQuoteLenMax;
To make it more clearly I am going to show an example.
I want to add a 263 characters long name to a variable (longvar) and to do that I will simply use a data step… and when I do that I will see the WARNING message in Log.
data TEST;
x="(SEE DOCTOR'S LETTER)3RD ADMINISTRATION OF MTX WAS DELAYED BY 14 DAYS AND WAS REDUCED TO 1G/M2 INSTEAD OF 5G/M2, PROBLEMS, E.COLI SEPSIS WITH HEART INSUFFICIENCY WITH SINUS TACHYCARDY, PARALYTIC ILEUS, TACHYPNEA , PATIENT DIED ON 21.04.98 FROM MULTIORGAN FAILURE.";
y=length(x);
put x;
run;
LOG FILE:
There is a SAS option (NOQUOTELENMAX) which will take care of the WARNING message.
Options noQuoteLenMax; *Just before the Data step;
Don’t forget to change back to Options QuoteLenMax; after the end of the Data step.
Options noQuoteLenMax;
Saturday, April 17, 2010
CALL EXECUTE: Easy way to print or sort multiple files.
When printing multiple files, or sorting multiple datasets, the traditional method is to write multiple steps as below.
Proc print data=libref.ae; var _all_; run;
Proc print data=libref.conmed; var _all_; run;
Proc print data=libref.demog; var _all_; run;
Proc print data=libref.lab; var _all_; run;
Proc print data=libref.medhist; var _all_; run;
If you are like me who likes to simplify the traditional SAS code here is the tip. CALL EXECUTE comes to rescue here.
*Using Disctionary Tables and Call Execute;
proc sql;
create table dsn as select distinct memname from dictionary.tables
where libname="LIBREF" and memtype="DATA";
quit;
*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=final.||'memname||';by usubjid; run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=final.||'trim(memname)||';var _all_; run;");
run;
*Using Proc Contents and Call Execute;
proc contents data=libref._all_ out=contents(keep=memname);
run;
*Create a macro variable memname with list of all the datasets;
proc sql;
select distinct memname into:memname from contents;
quit;
%put memname;
*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=libref.||'trim(memname)||';by usubjid; run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=libref.||'trim(memname)||';var _all_; run;");
run;
*Using SASHELP Views and Call Execute to sort the dataset by usubjid;
*Sorts all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc sort data=libref.||'trim(memname)||';by usubjid;run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc print data=libref.||'trim(memname)||';by _all_;run;");
run;
*If you are not printing/sorting all the datasets in the library here is code for that.The Following code only prints 4 datasets (AE, Conmed, Demog , Lab and Medhist);
data _null_;
do dsname='ae', 'conmed', 'demog', 'lab', 'medhist';
call execute("Proc print data=libref.||'trim(dsname))||'; var _all_; run;");
end;
run;
Proc print data=libref.ae; var _all_; run;
Proc print data=libref.conmed; var _all_; run;
Proc print data=libref.demog; var _all_; run;
Proc print data=libref.lab; var _all_; run;
Proc print data=libref.medhist; var _all_; run;
If you are like me who likes to simplify the traditional SAS code here is the tip. CALL EXECUTE comes to rescue here.
*Using Disctionary Tables and Call Execute;
proc sql;
create table dsn as select distinct memname from dictionary.tables
where libname="LIBREF" and memtype="DATA";
quit;
*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=final.||'memname||';by usubjid; run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=final.||'trim(memname)||';var _all_; run;");
run;
*Using Proc Contents and Call Execute;
proc contents data=libref._all_ out=contents(keep=memname);
run;
*Create a macro variable memname with list of all the datasets;
proc sql;
select distinct memname into:memname from contents;
quit;
%put memname;
*Sorts all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc sort data=libref.||'trim(memname)||';by usubjid; run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set dsn;
call execute ("proc print data=libref.||'trim(memname)||';var _all_; run;");
run;
*Using SASHELP Views and Call Execute to sort the dataset by usubjid;
*Sorts all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc sort data=libref.||'trim(memname)||';by usubjid;run;");
run;
*Prints all the datasets using Call Execute;
data _null_;
set sashelp.vtable (where=(libname="LIBREF"));
call execute("proc print data=libref.||'trim(memname)||';by _all_;run;");
run;
*If you are not printing/sorting all the datasets in the library here is code for that.The Following code only prints 4 datasets (AE, Conmed, Demog , Lab and Medhist);
data _null_;
do dsname='ae', 'conmed', 'demog', 'lab', 'medhist';
call execute("Proc print data=libref.||'trim(dsname))||'; var _all_; run;");
end;
run;
Saturday, April 10, 2010
Write a Letter using SAS/ Emailing with SAS
SAS can do many things which most of us don’t have a clue. Here is one example….
Writing a letter:
filename formltr 'C:\Documents and Settings\sreddy\Desktop\formltr.rtf';
data address;
infile datalines;
input @ 1 stno
@ 6 lane $12.
@19 aptno $7.
@27 city $9.
@37 state $2.
@40 zip ;
datalines;
2550 Augusta Blvd Apt#203 Fairfield OH 45014
;
run;
data _null_;
retain lm 5;
set address;
file formltr;* print notitles;
put _page_;
adr1 = trim(stno) ' ' trim(lane);
put @lm adr1;
adr2 = trim(aptno);
put @lm adr2;
adr3 = trim(city) ||', '|| trim(state) ||' '|| trim(zip);
put @lm adr3;
adr4 = trim('Dear')|| ' ' ||trim('SAS') || ' ' || trim('Users,');
put / @lm adr4;
put / @lm 'StudySAS Blog offers a lot of information regarding tips and tutorials on various topics ' ;
put @lm 'in SAS. It covers basics to get started to more in-depth topics like Macros and Proc SQL.';
put @lm 'It is a great site to browse to help broaden and deepen your SAS knowledge in a variety';
put @lm 'of areas.';
put / @lm 'Thanks for visiting StudySAS Blog. ';
put //// @lm 'Sarath Annapareddy';
run;
Emailing with SAS
How to send an email using SAS:
filename mymail email sastest@abc.com subject="Sending Email using SAS" from=abctest@gmail.com' attach="C:\Documents and Settings\sreddy\Desktop\formltr.rtf";
data _null_;
file mymail;
put 'Hello there, Please review the attached letter.';
put 'Thanks,';
put 'Sarath';
run;
quit;
Writing a letter:
filename formltr 'C:\Documents and Settings\sreddy\Desktop\formltr.rtf';
data address;
infile datalines;
input @ 1 stno
@ 6 lane $12.
@19 aptno $7.
@27 city $9.
@37 state $2.
@40 zip ;
datalines;
2550 Augusta Blvd Apt#203 Fairfield OH 45014
;
run;
data _null_;
retain lm 5;
set address;
file formltr;* print notitles;
put _page_;
adr1 = trim(stno) ' ' trim(lane);
put @lm adr1;
adr2 = trim(aptno);
put @lm adr2;
adr3 = trim(city) ||', '|| trim(state) ||' '|| trim(zip);
put @lm adr3;
adr4 = trim('Dear')|| ' ' ||trim('SAS') || ' ' || trim('Users,');
put / @lm adr4;
put / @lm 'StudySAS Blog offers a lot of information regarding tips and tutorials on various topics ' ;
put @lm 'in SAS. It covers basics to get started to more in-depth topics like Macros and Proc SQL.';
put @lm 'It is a great site to browse to help broaden and deepen your SAS knowledge in a variety';
put @lm 'of areas.';
put / @lm 'Thanks for visiting StudySAS Blog. ';
put //// @lm 'Sarath Annapareddy';
run;
- lm: represents left margin
- / : forward slash symbol ( / ) skips a line.
- If you want to skip ‘N’ number of lines use ‘N’ number of flashes after the PUT statement.
- The trim function and concatenation operator (||) are important here because without these you will get extra spaces which we probably don't want see in our letter.
Emailing with SAS
How to send an email using SAS:
filename mymail email sastest@abc.com subject="Sending Email using SAS" from=abctest@gmail.com' attach="C:\Documents and Settings\sreddy\Desktop\formltr.rtf";
data _null_;
file mymail;
put 'Hello there, Please review the attached letter.';
put 'Thanks,';
put 'Sarath';
run;
quit;
Subscribe to:
Posts (Atom)