Friday, February 24, 2012

Transcoding Problem: Option (correctencoding=wlatin1)

Have you ever tried to convert the default encoding to Wlatin1 (Windows SAS Session Encoding)?

Let me tell you the story behind writing this post….

Today I was asked to send SAS datasets to one of the client. I transferred the SAS datasets to the client and immediately after, I got an email from the so called client saying the encoding of SAS datasets is different this time when compared with the last transfer. He said It’s causing problems in Proc compare process.

Opps… bummer…. Client’s email got me little worried ...
I checked the Proc contents details and saw the change in the encoding. I investigated the issue and found out that Unicode SAS with UTF-8 encoding uses 1 to 4 bytes to handle Unicode data. It is possible for the 8-bit characters to be expanded by 2 to 3 bytes when transcoding occurs, which causes the truncation error.

Because of the truncation problem I was asked to change the unicoding back to WLATIN1 so that the character data present in the SAS datasets represents the US and Europe characters in windows.

Here is the code to do that.

proc datasets lib=SDTM;
modify supplb/correctencoding=wlatin1;
run;

Unicode Basics:

Unicode is the universal character encoding that supports the interchange, processing, and display of
characters and symbols found in the world’s writing systems. Other character encodings are limited to
subsets of all languages. 

Transcoding problem:

Currently, SAS/ACCESS Interface to ODBC cannot support Unicode. DBCS data and single-byte
non-ASCII characters cannot be correctly processed.

In a SAS UTF-8 session, data imported appear as question marks. Although the encoding property of
imported data is UTF-8, the real encoding of the data is not changed to UTF-8.

Workaround:

/* Set the correct encoding to ‘WLATIN1’ */
proc datasets lib=stagings;
modify custtype_w1/correctencoding=wlatin1;
run;

/* Transcode data from ‘Wlatin1’ to ‘UTF-8’ */
data stagings.custtype_u8;
set stagings.custtype_w1;
run;


Ref:







Sunday, November 13, 2011

SDTM Compliance Checks

Validation checks or tools to check the compliance of SDTM data

JANUS is a standard database model which is based on the CDISC’s SDTM standard. JANUS is used by the FDA to store the submitted SDTM clinical data. As a part of data definition file submission pharmaceutical companies have to submit SAS datasets in transport file (.xpt) format along with annotated CRF and Define.xml file. The reason being this is… to properly load the clinical data into JANUS database which is maintained by the FDA. It is very easy for FDA reviewers to review the clinical data once they load the clinical data into their JANUS database. They can even produce ad-hoc reports and perform cross-study review at the same time. FDA runs compliance checks on the data submitted to make sure the data was collected as per the SDTM standard. FDA checks the compliance of data by running the WebSDM™ developed by PhaseForward). WebSDM™ is a SDTM compliance check validation tool performs a set of SDTM compliance checks on clinical data before it’s gets loaded into the JANUS database.

SDTM VALIDATION TOOLS:

SDTM Validation tools verifies that the clinical data is in compliance with the standards and the assumptions of the SDTM implementation guide v3.1.1 or v3.1.2. They also verifies that the Define file created is in compliance with the ODM v 1.2.
1) LINCOLN TECHNOLOGIES - WebSDM™:

WebSDM™ is an application, tests the compliance of submission-ready files (in SAS V5 Transport format or Oracle® views) according to the SDTM IG. The FDA has been using the WebSDM™ since 2004 to review the clinical data. Users load SDTM-compliant files into WebSDM™ tool which can then check for errors and or inconsistencies in the structure and content of the data.


The checks available include detection of structural and consistency errors rated by severity (high, medium and low). We can get the details of the compliance checks performed by the WebSDM™ from Phase forward website.


It will be very useful for the sponsor to run the WebSDM checks on the clinical data because the FDA reviewers also use the same application to review the data. The one negative point of this application is .. it cannot be used for near SDTM complaint or client specific standard datasets.

2) SAS - Proc CDISC:


SAS provides Proc CDISC to perform SDTM compliance checks on the clinical data. Proc CDISC is already included in the software as a procedure, If you use newer version of SAS (Version 9.1.3 Service Pack 3 and above).


Proc CDISC only supports few domains to run the compliance checks (15 out of the 23 domains outlined in CDISC SDTM version 3.1. It supports the


Interventions (CM, EX, SU),
Events (AE, DS, MH),
Findings (EG, IE, LB, PE, QS, SC, VS), and
Special purpose (DM, CO) class of domains.

Proc CDISC does not support the trial design domains, custom-defined domains or other new SDTM domains like MB, MC, PC and PK etc.

3) OpenCDISC Validator:

OpenCDISC is a new application which has been released by openCDISC to run the compliance checks on the clinical data. OpenCDISC validator is an open source java based project that provides validation of datasets against SDTM datasets. The advantage of this software application is ..It is open software so can be downloaded for free. The second advantage of this application is, It includes the combination of WebSDM and Janus checks. It can also check the ADaM compliance standards.


OpenCDISC can check the compliance of SAS transport as well as delimited files (.CSV etc) as per the SDTM v3.1.1 or SDTM v3.1.2 standards. One major advantage of this tool, this can perform compliance checks on all sorts of datasets (ex: SDTM compliant, SDTm non-compliant).

Here are few other nice SUGI papers that have lots of information regarding .. Open CDISC SDTM compliance checks:

Open CDISC Plus
A Standard SAS® Program for Corroborating OpenCDISC Error Message

Ref:


Study Data Specifications document V 1.6 by FDA.
In-Depth Review of Validation Tools to Check Compliance of CDISC SDTM-Ready Clinical Datasets http://www.lexjansen.com/pharmasug/2010/cd/cd13.pdf
SAS and Open Source Tools for CDISC SDTM Compliance Checks for Regulatory Submissions
http://www.nesug.org/Proceedings/nesug10/ph/ph04.pdf
Validating CDISC SDTM-Compliant Submission-Ready Clinical Datasets with an In-House SAS® Macro-Based Solution http://www.lexjansen.com/pharmasug/2008/rs/rs07.pdf
Max Kanevsky (2008), “Validating SDTM, an open source solution”, proceeding of the CDISC Interchange 2008

Thursday, October 27, 2011

How to remove carriage return and linefeed characters within quoted strings.

HANDLING SPECIAL EMBEDDED CHARACTERS

To manage and report data in DBMS that contains very long text fields is not easy. This can be frustrating if the text field has special embedded symbols such as tabs, carriage returns (‘OD’x ), line feeds (‘OA’x) and page breaks. But here is simple SAS code which takes care of those issues.

The normal line end for Windows text files is a  carriage return character or a line feed character so
 The syntax for taking out all carriage return ('OD'x) and line feed ('OA'x) characters is
comment= Compress(comment,'0D0A'x);
                             or
comment= TRANWRD(comment,'0D0A'x,’’);

If you just want to take out the Carriage Return, use this code:
comment= TRANWRD(comment,'0D'x,'');

You could also try this one too..

Comment=compress(Comment, ,"kw");*k is for keep, w is for "write-able";


Thursday, October 13, 2011

Counting the number of missing and non-missing values for each variable in a data set.

/* create sample data */
data one;
input a $ b $ c $ d e;
cards;
a . a 1 3
. b . 2 4
a a a . 5
. . b 3 5
a a a . 6
a a a . 7
a a a 2 8
;
run;


/* create a format to group missing and non-missing */
proc format;
value $missfmt ' '='missing'
other='non-missing';
value missfmt .='missing'
other='non-missing';
run;


%macro lst(dsn);
/** open dataset **/
%let dsid=%sysfunc(open(&dsn));


/** cnt will contain the number of variables in the dataset passed in **/
%let cnt=%sysfunc(attrn(&dsid,nvars));


%do i = 1 %to &cnt;
/** create a different macro variable for each variable in dataset **/
%let x&i=%sysfunc(varname(&dsid,&i));
/** list the type of the current variable **/
%let typ&i=%sysfunc(vartype(&dsid,&i));
%end;


/** close dataset **/
%let rc=%sysfunc(close(&dsid));


%do i = 1 %to &cnt;
/* loop through each variable in PROC FREQ and create */
/* a separate output data set */
proc freq data=&dsn noprint;
tables &&x&i / missing out=out&i(drop=percent rename=(&&x&i=value));
format &&x&i %if &&typ&i = C %then %do; $missfmt. %end;
%else %do; missfmt. %end;;
run;


data out&i;
set out&i;
varname="&&x&i";
/* create a new variable that is character so that */
/* the data sets can be combined */
%if &&typ&i=N %then %do;
value1=put(value, missfmt.);
%end;
%else %if &&typ&i=C %then %do;
value1=put(value, $missfmt.);
%end;
drop value;
rename value1=value;
run;


%end;


data combine;
set %do i=1 %to &cnt;
out&i
%end;;
run;


proc print data=combine;
run;


%mend lst;
%lst(one)


/* another way to reshape the COMBINE data set */
proc transpose data=combine out=out(drop=_:);
by varname;
id value;
var count;
run;


proc print data=out;
run;


Original output:

COUNT varname value
2 a missing
5 a non-missing
2 b missing
5 b non-missing
1 c missing
6 c non-missing
3 d missing
4 d non-missing
7 e non-missing

Transposed output:

varname missing non_missing
a 2 5
b 2 5
c   1   6
d   3   4
e   .    7


Source: http://support.sas.com/kb/44/124.html

Wednesday, August 17, 2011

When do I use a WHERE statement instead of an IF statement to subset a data set?

When programming in SAS, there is almost always more than one way to accomplish a task. Beginning programmers may think that there is no difference between using the WHERE statement and the IF statement to subset your data set. Knowledgeable programmers know that depending on the situation, sometimes one statement is more appropriate than the other. For example, if your subset condition includes automatic variables or new variables created within the DATA step, then you must use the IF statement instead of the WHERE statement. This tip shows you how and when to apply the WHERE and IF statements to get correct and reliable results. It also reviews the similarities as well as the differences between these two SAS programming approaches. Detail differences in program efficiency between the two approaches will not be covered in this tip.


For more details refer to http://support.sas.com/kb/24/286.html

Monday, July 4, 2011

Transporting SAS Files using Proc Copy and or Proc Cport/Proc Cimport

When moving SAS datasets /catalogs from one type of computer to another, there are several things to be considered, such as the operating systems of the two computers, the versions of SAS and the type of communication link between the computers.

The easiest way to move SAS datasets from one system to another system is to:

Create a transport file using any SAS version.
Move the transport file to the new system.
Import the transport file on the new system.

Transport datasets are 80-byte length binary files made from SAS datasets. PROC COPY or PROC CPORT can create Transport datasets but they both create different types of transport files. Transport files can be created and read using either PROC COPY or PROC CPORT & PROC CIMPORT, but you cannot mix and match. Transport files created with PROC COPY must be read with PROC COPY; those created by PROC CPORT must be read with PROC CIMPORT.

PROC COPY uses an engine (i.e. XPORT) to create a SAS transport file. PROC COPY is used to transport SAS datasets only. It is version independent, but when used in version 8 will make only short variable names and table names (<= 8 characters).  

PROC COPY is likely to be the best choice for transporting SAS datasets (only SAS datasets).
PROC CPORT creates a different type of SAS transport file, an 80-byte binary file. PROC CPORT can transport catalogs as well as tables, but not views. It cannot transport a file to an earlier SAS version.  PROC CIMPORT is used to import transport files created with PROC CPORT.

The best choice for transporting datasets and catalogs simultaneously is to use PROC CPORT/PROC CIMPORT.
Proc COPY vs. Proc CPORT/CIMPORT

PROC CPORT/CIMPORT can be used to transport both SAS datasets and SAS catalogs. Proc CPORT and Proc CIMPORT only allow file transport from earlier version to a newer version (i.e. from SAS 6 to SAS 9) and not the opposite (i.e. from SAS 9 to SAS 8.2).

PROC COPY can be used to transfer files from newer version of SAS to an earlier release (i.e. from SAS 9 to SAS 6.0) and vice versa without any trouble. Proc Copy will not transport SAS catalogs. If you must move catalogs with PROC COPY SAS catalogs have to be converted to a SAS dataset using PROC FORMAT with the CNTLOUT option.
Note: When moving files from newer version (ex: SAS 9) to older version (ex: SAS 8.0), the long variable names in SAS 9 will get truncated to 8 bytes.

SAS Member Type
XPORT Engine with either DATA step or PROC COPY
PROC CPORT and PROC CIMPORT
Dataset
Yes
Yes
Catalogs
No
Yes

Now, here is the example about how to create transport (.xpt) files from SAS datasets.
/********************************************************************* Create a transport(.xpt) file and convert back the SAS transport (.xpt) file to SAS dataset
*********************************************************************/
%let libname=C:\Users\Sarath Annapareddy\Desktop\Transport;
* Create sample dataset;
libname sasfile "&libname";

data sasfile.test;
input var1 var2 var3;
datalines;
1 26 31
1 28 28
1 30 31
2 32 31

2 34 29
;

run;

/*******************************************************
*Create a .xpt file from a SAS dataset using Proc Copy;
/*********************************************************************/
libname sasfile "&libname"; *Location of SAS dataset; xptfile xport "&libname\test.xpt";*Location of .xpt file created;

libname
proc copy in=sasfile out=xptfile memtype=data;
select test;
run;
*Convert the .xpt file back to a SAS dataset using Proc copy;

libname xptfile xport "&libname\test.xpt";
libname sasfile2 "&libname\new\";

proc copy in=xptfile out=sasfile2 memtype=data;
run;

*Convert the .xpt file back to a sas dataset using data step;
libname datain xport "&libname\test.xpt" /*directory path where file is located/SAS export file name*/;

data xptdata;
set datain.test;
run;
************************************************************/
*Create a .xpt file from a SAS dataset using Proc Cport;
***************************************************************************/ *Proc Cport/Proc Cimport;



libname sasfile "&libname";
data sasfile.test2;
input var1 var2 var3 var4;
datalines;
1 26 31 1
1 28 28 2

1 30 31 3
2 32 31 4

2 34 29 5
;

run;

libname sasfile "&libname"; *Location of SAS dataset created;
libname xptfile xport "&libname\test2.xpt";

proc cport data=sasfile.test2 file="&libname\test2.xpt";
run;
*Convert the .xpt file back to a SAS dataset;
libname sasfile2 "&libname\new";*Location of SAS dataset created
libname xptfile xport "&libname\test2.xpt";*Location of the .xpt file;

proc cimport infile=xptfile library=sasfile2;
run;

REFERENCES:

http://www.umass.edu/statdata/software/handouts/SASTransport.pdf
http://www.ts.vcu.edu/kb/2074.html
SAS Documentation regarding Traditional Methods for creating and Importing Files in Transport files.

Tuesday, June 14, 2011

How to generate the month name from a numeric date value

Task: I have a SAS date and wanted to create a variable with the month name.
Here is how to do it......

Use MONNAMEw. format which is simple and easy.  You need to be using  SAS 9.X versions to make it work.

/*Use MONNAMEw. format*/
data month;
input date:mmddyy8.;
month_name=put(date,monname3.);
datalines;
01/15/04
02/29/04
07/04/04
08/18/04
12/31/04
;
run;

proc print;
run;

Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...