@

Thursday, October 13, 2011

Counting the number of missing and non-missing values for each variable in a data set.

/* create sample data */
data one;
input a $ b $ c $ d e;
cards;
a . a 1 3
. b . 2 4
a a a . 5
. . b 3 5
a a a . 6
a a a . 7
a a a 2 8
;
run;

/* create a format to group missing and non-missing */
proc format;
value $missfmt ' '='missing'
other='non-missing';
value missfmt .='missing'
other='non-missing';
run;

%macro lst(dsn);
/** open dataset **/
%let dsid=%sysfunc(open(&dsn));

/** cnt will contain the number of variables in the dataset passed in **/
%let cnt=%sysfunc(attrn(&dsid,nvars));

%do i = 1 %to &cnt;
/** create a different macro variable for each variable in dataset **/
%let x&i=%sysfunc(varname(&dsid,&i));
/** list the type of the current variable **/
%let typ&i=%sysfunc(vartype(&dsid,&i));
%end;

/** close dataset **/
%let rc=%sysfunc(close(&dsid));

%do i = 1 %to &cnt;
/* loop through each variable in PROC FREQ and create */
/* a separate output data set */
proc freq data=&dsn noprint;
tables &&x&i / missing out=out&i(drop=percent rename=(&&x&i=value));
format &&x&i %if &&typ&i = C %then %do; $missfmt. %end;
%else %do; missfmt. %end;;
run;

data out&i;
set out&i;
varname="&&x&i";
/* create a new variable that is character so that */
/* the data sets can be combined */
%if &&typ&i=N %then %do;
value1=put(value, missfmt.);
%end;
%else %if &&typ&i=C %then %do;
value1=put(value, $missfmt.);
%end;
drop value;
rename value1=value;
run;

%end;

data combine;
set %do i=1 %to &cnt;
out&i
%end;;
run;

proc print data=combine;
run;

%mend lst;
%lst(one)

/* another way to reshape the COMBINE data set */
proc transpose data=combine out=out(drop=_:);
by varname;
id value;
var count;
run;

proc print data=out;
run;

Original output:

COUNT varname value
2 a missing
5 a non-missing
2 b missing
5 b non-missing
1 c missing
6 c non-missing
3 d missing
4 d non-missing
7 e non-missing

Transposed output:

varname missing non_missing
a 2 5
b 2 5
c 1 6
d 3 4
e . 7

Source: http://support.sas.com/kb/44/124.html

Wednesday, August 17, 2011

When do I use a WHERE statement instead of an IF statement to subset a data set?

When programming in SAS, there is almost always more than one way to accomplish a task. Beginning programmers may think that there is no difference between using the WHERE statement and the IF statement to subset your data set. Knowledgeable programmers know that depending on the situation, sometimes one statement is more appropriate than the other. For example, if your subset condition includes automatic variables or new variables created within the DATA step, then you must use the IF statement instead of the WHERE statement. This tip shows you how and when to apply the WHERE and IF statements to get correct and reliable results. It also reviews the similarities as well as the differences between these two SAS programming approaches. Detail differences in program efficiency between the two approaches will not be covered in this tip.

For more details refer to http://support.sas.com/kb/24/286.html

Monday, July 4, 2011

Transporting SAS Files using Proc Copy and or Proc Cport/Proc Cimport

When moving SAS datasets /catalogs from one type of computer to another, there are several things to be considered, such as the operating systems of the two computers, the versions of SAS and the type of communication link between the computers.

The easiest way to move SAS datasets from one system to another system is to:

Create a transport file using any SAS version.

Move the transport file to the new system.

Import the transport file on the new system.

Transport datasets are 80-byte length binary files made from SAS datasets. PROC COPY or PROC CPORT can create Transport datasets but they both create different types of transport files. Transport files can be created and read using either PROC COPY or PROC CPORT & PROC CIMPORT, but you cannot mix and match. Transport files created with PROC COPY must be read with PROC COPY; those created by PROC CPORT must be read with PROC CIMPORT.

PROC COPY uses an engine (i.e. XPORT) to create a SAS transport file. PROC COPY is used to transport SAS datasets only. It is version independent, but when used in version 8 will make only short variable names and table names (<= 8 characters).

PROC COPY is likely to be the best choice for transporting SAS datasets (only SAS datasets).

PROC CPORT creates a different type of SAS transport file, an 80-byte binary file. PROC CPORT can transport catalogs as well as tables, but not views. It cannot transport a file to an earlier SAS version. PROC CIMPORT is used to import transport files created with PROC CPORT.

The best choice for transporting datasets and catalogs simultaneously is to use PROC CPORT/PROC CIMPORT.

Proc COPY vs. Proc CPORT/CIMPORT

PROC CPORT/CIMPORT can be used to transport both SAS datasets and SAS catalogs. Proc CPORT and Proc CIMPORT only allow file transport from earlier version to a newer version (i.e. from SAS 6 to SAS 9) and not the opposite (i.e. from SAS 9 to SAS 8.2).

PROC COPY can be used to transfer files from newer version of SAS to an earlier release (i.e. from SAS 9 to SAS 6.0) and vice versa without any trouble. Proc Copy will not transport SAS catalogs. If you must move catalogs with PROC COPY SAS catalogs have to be converted to a SAS dataset using PROC FORMAT with the CNTLOUT option.

Note: When moving files from newer version (ex: SAS 9) to older version (ex: SAS 8.0), the long variable names in SAS 9 will get truncated to 8 bytes.

SAS Member Type	XPORT Engine with either DATA step or PROC COPY	PROC CPORT and PROC CIMPORT
Dataset	Yes	Yes
Catalogs	No	Yes

Now, here is the example about how to create transport (.xpt) files from SAS datasets.

/********************************************************************* Create a transport(.xpt) file and convert back the SAS transport (.xpt) file to SAS dataset

*********************************************************************/

%let libname=C:\Users\Sarath Annapareddy\Desktop\Transport;

* Create sample dataset;

libname sasfile "&libname";

data sasfile.test;

input var1 var2 var3;
datalines;
1 26 31
1 28 28
1 30 31
2 32 31
2 34 29

;

run;

/*******************************************************
*Create a .xpt file from a SAS dataset using Proc Copy;
/*********************************************************************/

libname sasfile "&libname"; *Location of SAS dataset; xptfile xport "&libname\test.xpt";*Location of .xpt file created;

libname

proc copy in=sasfile out=xptfile memtype=data;
select test;
run;

*Convert the .xpt file back to a SAS dataset using Proc copy;

libname xptfile xport "&libname\test.xpt";
libname sasfile2 "&libname\new\";

proc copy in=xptfile out=sasfile2 memtype=data;
run;

*Convert the .xpt file back to a sas dataset using data step;

libname datain xport "&libname\test.xpt" /*directory path where file is located/SAS export file name*/;

data xptdata;
set datain.test;
run;

************************************************************/
*Create a .xpt file from a SAS dataset using Proc Cport;
***************************************************************************/ *Proc Cport/Proc Cimport;

libname sasfile "&libname";

data sasfile.test2;
input var1 var2 var3 var4;
datalines;
1 26 31 1
1 28 28 2
1 30 31 3
2 32 31 4
2 34 29 5
;
run;

libname sasfile "&libname"; *Location of SAS dataset created;
libname xptfile xport "&libname\test2.xpt";

proc cport data=sasfile.test2 file="&libname\test2.xpt";

run;

*Convert the .xpt file back to a SAS dataset;

libname sasfile2 "&libname\new";*Location of SAS dataset created
libname xptfile xport "&libname\test2.xpt";*Location of the .xpt file;

proc cimport infile=xptfile library=sasfile2;
run;

REFERENCES:

http://www.umass.edu/statdata/software/handouts/SASTransport.pdf
http://www.ts.vcu.edu/kb/2074.html
SAS Documentation regarding Traditional Methods for creating and Importing Files in Transport files.

Tuesday, June 14, 2011

How to generate the month name from a numeric date value

Task: I have a SAS date and wanted to create a variable with the month name.
Here is how to do it......

Use MONNAMEw. format which is simple and easy. You need to be using SAS 9.X versions to make it work.

/*Use MONNAMEw. format*/
data month;
input date:mmddyy8.;
month_name=put(date,monname3.);
datalines;
01/15/04
02/29/04
07/04/04
08/18/04

12/31/04

;

run;

proc print;

run;

ERROR: The MS Excel table (worksheetname) has been opened for OUTPUT.

I happend to stumbleupon a post from SAS support blog regarding the ERROR message in the LOG file when trying to output a SAS dataset in the form of Excel sheet.

Direct link:

ERROR: The MS Excel table (worksheetname) has been opened for OUTPUT.

This table already exists, or there is a name conflict with an existing object. This table will not be replaced. This engine does not support the REPLACE option.
ERROR: Export unsuccessful. See SAS Log for details.

When you use the EXPORT procedure on an Excel workbook, the workbook might be corrupted and the following error message generated:

This problem can occur if a previous EXPORT procedure attempts to export a SAS data set in the workbook that does not contain any observations. The following example illustrates an export procedure on such a data set:

%macro blowup;

data a;
a=1;
stop;
run;

%do i=1 %to 2;

proc export data=a outfile="c:\temp\test.xls"
dbms=excel2000 replace;
run;

%end;
%mend;
%blowup;

The problem occurs because the SAS data set does not contain any data to export. As a result, a corrupted structure is created.

To circumvent the problem, do one of the following:

•Use the SQL procedure with a DROP TABLE statement to drop the empty data set before replacing it, as shown in the following example:

%macro blowup;

data a;
a=1;
stop;
run;

%do i=1 %to 2;

libname test excel 'c:\sastest\test2.xls';

proc sql;
drop table test.a;
quit;

libname test clear;

proc export data=a outfile="c:\sastest\test2.xls"
dbms=excel2000 replace;
run;

%end;
%mend;
%blowup;

Sunday, February 6, 2011

How to read next record while working on the current record. (LEAD FUNCTION)

Even though there is no function is available in SAS to do exactly the opposite work of the LAG function (i.e: reading the next record while working on the current one), there are few things you can do to do exactly that.
Here are few simple techniques which are proved to work without any problem.

*SAMPLE DATASET;

data test;
input id age grp ;
datalines;
1 10 1
2 20 1
3 30 1
4 40 1
5 50 1
1 10 2
2 20 2
3 30 2
4 40 2
5 50 2
;
run;

*1) Using the POINT feature along with automatic variable _N_;
This solution was suggested by Paul M. Dorfman;

data leads;
_n_ ++ 1;
if _n_ le n then do;
set one point=_n_;
leadage=age;
end;
set one nobs=n;
run;

*OR*;

data leads;
_n_ ++ _n_ lt n;
set one point=_n_;
leadage=age;
set one nobs=n end=end;
if end then leadage=.;
run;

By using the above techniques, you can jump a bit higher and even look values of two, three or any numbers of observations in advance by advancing the value of automatic variable _N_ by appropriate number (in general _N_ ++ n ). Only thing you have to keep in mind while doing so is before applying the POINT processing, you have to apply appropriate lag (lagn) to grouping variable.

References:
SAS-L. http://www.listserv.uga.edu/cgi-bin/wa?A2=ind9904E&L=sas-l&P=R6185

http://www.nesug.org/Proceedings/nesug09/cc/cc26.pdf

Option3:

option mergeNoBy=nowarn; *Supress the warning message in the log. WARNING: No BY statement was specified for a MERGE statement.;

data lead;
merge one one(firstobs = 2 rename=(age=leadage));
run;

Saturday, January 22, 2011

STUDY 'DAY' CLCULATION (ONE-LINER)

Recently I stumbled upon a SUGI-Paper SAS 1-Liners by Stephen Hunt. I liked the way Stephen developed the 1-liner for STUDY DAY calculation.

Direct link:

One of the most common calculation used across all types of programming is determining a relative 'day' based on 2 date fields. In clinical trials the initial 'Study Day' is generally considered to begin at either randamization or dosing, thus assessments made prior to this starting point require a slight variation in the calculating in order to preserve the typical 'no day 0' concept.

SUGI proceedings10/054-2010.pdf

When it comes to calculating such, some programmers opt to both with evaluating whether a visit date occurred on or /after randomization.

if visdt > randdt >.z then stydy=visdt-randdt;
else stydt=visdt-randdt+1;

However, this is unnecessary, sine the 1-liner will suffice:

if visdt >.z & randdt >.z then stydy=visdt-randdt+(visdt>=randdt);

Thanks to Stephen for his code.

Here is the code I use from now to compute the study day.

if nmiss(visidt,randdt)=0 then stydy=visdt-randdt+(visdt>=randdt);