Tuesday, May 18, 2010

Sending the LOG and OUTPUT from PC SAS to a seperate file

Here is how to direct the SAS LOG file and or SAS Output  to a seperate file.

Approach 1: Using Display Manager Statements;

filename log 'C:\temp\logfile.log';

filename out 'C:\temp\output.lst';

*Select only male students and age less than 16;
proc sql;
create table males as
select age, height, weight
from sashelp.class
where sex='M' and age lt 16 
order by age;
quit;


*Get the descriptive statistics for height variable by age;
proc means data=males ;
by age;
var height;
output out=htstats mean=mean n=n std=sd median=med min=min max=max;
run;


DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;';

Information about Display Manager Commands:
DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files;
SAS Display Manager Commands



Approach 2: Using Proc PRINTTO procedure;

Refer:  How to save the log file or what is PROC PRINTTO procedure 

('DiggThis’)

Sunday, May 9, 2010

Random Sample Selection

Last week my manager asked me to randomly pick 10%observations from a large data set and then create a listing so that the Data management programmers can QC the data. I want to share some thoughts here … how easy and simple to do random sampling.
Approach 1:

Data step Approach: In this approach, the observations are shuffled using the RANUNI function which assigns a random number to each observation.

Step1: Generating the Random Vector (shuffling) using the RANUNI function;
The RANUNI function generates a random number from a continuous uniform distribution (the interval (0, 1).

Step2: After assigning a random number to each record, the records can then be sorted in ascending or descending order of the random numbers.;

data randsamp ;
input patno @@;
random=RANUNI(-1);
* RANUNI function to assign a random number to each record.;
* Here the seed is negative integer (-1) so the results are not replicable.;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;

*Sort the records by increasing shuffling order;
proc sort data=randsamp; by random; run;

*CREATE MACRO VARIABLE FOR 9 (10% of 83 Subjects) RANDOM PATIENTS LIST;
proc sql;
select distinct(patno) into:random separated by "," from randsamp where monotonic() le 9;
quit;
%put &random;

Proc SQL Approach: 
In this approach also, the observations are shuffled using the RANUNI function which assigns a random number to each observation.

*RANDOMLY SELECTING 10% OF A LARGE DATASET using Proc SQl and RANUNI function.;
*The following Proc SQL code will create a table called rands consisting of approximately 10% of the records randomly selected from dataset randsamp;

DATA randsamp ;
input patno @@;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;
proc sql;
create table rands(where= (monotonic() eq 9)) as
select *, RANUNI(-1) as random from randsamp order by random ;
quit;

*Proc Surveyselect Approach;

*The following Proc Survey select code will create a table called sample consisting of approximately 10% of the records randomly selected from dataset randsamp;

DATA randsamp;
input patno @@;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
;
run;

PROC SURVEYSELECT DATA=randsamp2 OUT=sample SAMPSIZE=9 SEED=-1;
RUN;

Important Note:
The value generated by the RANUNI function depends on a seed. The seed should be a non-negative integer from 1 to 2,147,483,646 in order to replicate the results of the RANUNI function. That is, given the same seed, the function produces the same result. If no seed, zero, or negative integers are specified as the seed, the computer clock sets the seed and results are not replicable.
Source: SESUG-2000 (P-404).pdf on Random Sample Selection by Imelda C. Go, Richland County School District One, Columbia, SC

If you use a positive number as a seed then you can replicate the random sample records as long as you don’t change the seed number. If you use the negative number (as in the above programs) you can’t replicate the records. Every time you submit the program random samples generated will be different.
('DiggThis’)

Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...