Thursday, November 29, 2012

Creating Custom or Non-Standard CDISC SDTM Domains

Here is the nice article about creating custom SDTM domains.........

Creating Custom or Non-Standard CDISC SDTM Domains


Within the Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM), standard domains are split into four main types: special purpose, relationships, trial design and general observation classes. General observation classes cover the majority of observations collected during a study and can be divided among three general classes:
  • The Interventions class captures investigational, therapeutic and other treatments that are administered to the subject (with some actual or expected physiological effect) either as specified by the study protocol (e.g., “exposure”), coincident with the study assessment period (e.g., “concomitant medications”), or other substances self-administered by the subject (such as alcohol, tobacco, or caffeine).
  • The Events class captures planned protocol milestones such as randomization and study completion, and occurrences, conditions, or incidents independent of planned study evaluations occurring during the trial (e.g., adverse events) or prior to the trial (e.g., medical history).
  • The Findings class captures the observations resulting from planned evaluations to address specific tests or questions such as laboratory tests, ECG testing, and questions listed on questionnaires.
When creating a custom domain, one should first confirm that there are no published domains available that the data can fit with. This can be done by checking against the reserved domain codes listed in the appendices of the SDTM implementation guide or checking the CDISC website for any recently published domains. The following list of points are not acceptable when creating custom domains:
  • If there is a common topic where the nature of the data is the same as another published domain.
  • If the custom domain is due to separation based on time.
  • If the data have been collected or are going to be used for different reasons. For example, if a lab parameter is collected for efficacy purposes the data must be represented in the LB domain and not in a custom ‘efficacy’ domain.
  • Data that were collected on separate CRF modules or pages and may fit into an existing domain.
  • If it is necessary to represent relationships between data that are hierarchical in nature. The use of RELREC can be utilized instead,
Once it is confirmed that the data does not fit with any published domains, it should be determined which of the three general observation classes best fits the topic of the data. The custom domain must fit in to one of the three general observation classes. The next step is to determine a two-letter domain code for the custom domain. Note that this should not be the same as any already published or in discussion domain code. The domain codes X-, Y- and Z- are reserved for sponsor use, where the hyphen may be replaced by any letter or number. This domain code will be the name of the domain and will also be used to replace all prefixes of variables for the class. The following steps can then be followed to create the custom domain:
  1. Select and include the required Identifier variables (STUDYID, DOMAIN, USUBJID and --SEQ) and any permissible Identifier variables (--GRPID, --REFID and --SPID).
  2. Include the Topic variable from the identified general observation class (--TRT for interventions, --TERM for events and --TESTCD for Findings).
  3. Select and include the relevant Qualifier variables from the identified general observation class only. These can be found in Section 2.2.1, 2.2.2 and 2.2.3 of the Study Data Tabulation Model.
  4. Select and include the applicable Timing variables. These can be found in Section 2.2.5 of the Study Data Tabulation Model and relate to all general observation classes.
  5. Set the order of the variables within the domain: identifiers must be followed by topic variables, qualifiers and finally timing variables. The variables must then be ordered within these roles to match the order of variables given in Sections 2.2.1, 2.2.2, 2.2.3, 2.2.4 and 2.2.5 of the Study Data Tabulation Model. The variable order in the define.xml must also match the order of the variables within the domain.
    6. Adjust the labels of the variables only as appropriate to properly convey the meaning in the context of the data being submitted in the newly created domain. Use title case for all labels.
  6. Ensure that appropriate standard variables are being properly applied by comparing the use of variables in standard domains.
  7. Ensure that there are no sponsor-defined variables added to the domain. Any sponsor-defined variables should be in a Supplemental Qualifier dataset.
  8. Variable attributes within the domain and Supplemental Qualifier dataset must conform to the SAS Version 5 transport file conventions. For example, variable names must be no longer than 8 characters, variables labels must be no longer than 40 characters and data value lengths must be no longer than 200 characters. Also, where possible the domain should be less than 400 MB in size, otherwise one should contact their review division before splitting domains as they may accept domains with a larger file size.

References:
Study Data Tabulation Model, Version 1.2; CDISC Submission Data Standards Team.
Study Data Tabulation Model Implementation Guide: Human Clinical Trials, Version 3.1.2; CDISC Submission Data Standards Team.

Source: www.Info.quanticate.com


Wednesday, June 20, 2012

ENCODING=Dataset Option

Let me explain the reason writing this post….

My coworker was having problem reading in a SAS dataset that he got from the Sponsor. It was a SAS dataset encoded with UTF-8 and other coding related stuff.
When he tried to get in the rawdata using Libname statement

libname rawdata  /sas/SAS913/SASDATA/CLIENT /ABC123/raw’;
data datasetname;
set rawdata.datasetname;
run;

When he runs the SAS code above, SAS stops at the current block, and returns an error that looks like this:

ERROR: Some character data was lost during transcoding in the dataset RAWDATA.DATSETNAME.

NOTE: The data step has been abnormally terminated.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: SAS set option OBS=0 and will continue to check statements. This may cause NOTE: No observations in data set.

NOTE: There were 20314 observations read from the data set RAWDATA.DATSETNAME.

WARNING: The data set WORK.DATASETNAME may be incomplete.  When this step was stopped there were 20314 observations and

         67 variables.

NOTE: DATA statement used (Total process time):

      real time           0.53 seconds

      cpu time            0.46 seconds

When he asked me why SAS stops in the middle, we were quick in taking the help of GOOGLE because we never saw this kind of ERROR message in the log. Unfortunately, the GOOGLE showed us so many links which has all the technical details. There were few options we saw in those links and nothing worked.  So after so many trials, we stumbled upon a way or we can say the solution, using ASCIIANY as the encoding option in the LIBNAME statement.
 

libname rawdata  /sas/SAS913/SASDATA/CLIENT /ABC123/raw’ inencoding=asciiany;



If you have only one dataset to use or you know the name of the dataset which has the encoding problem you could use the simple datastep too. Here is how…

data datasetname;set rawdata.datasetname (encoding='asciiany');
run;

If you refer to the SAS reference Guide, you will see this, which explains how this option works….



ENCODING= ANY | ASCIIANY | EBCDICANY | encoding-value

ANY
specifies that no transcoding occurs.
Note: ANY is a synonym for binary. Because the data is binary, the actual encoding is irrelevant.
ASCIIANY
specifies that no transcoding occurs when the mixed encodings are ASCII encodings.
Transcoding normally occurs when SAS detects that the session encoding and data set encoding are different. ASCIIANY enables you to create a data set that SAS will not transcode if the SAS session that accesses the data set has a session that encoding value of ASCII. If you transfer the data set to a machine that uses EBCDIC encoding, transcoding occurs.
EBCDICANY
specifies that no transcoding occurs when the mixed encodings are EBCDIC encodings.
The value for ENCODING= indicates that the SAS data set has a different encoding from the current session encoding. When you read data from a data set, SAS transcodes the data from the specified encoding to the session encoding. When you write data to a data set, SAS transcodes the data from the session encoding to the specified encoding.

For more details refer to the documentation….

Tuesday, June 12, 2012

Create a .CSV file of SAS dataset without column names or header row?



SAS places the variables names in Row 1 when you try to create an excel or .CSV file of the  SAS dataset. I have found a tip to tell SAS not to keep variable names in the row 1 of .CSV file.
SAScommunity.org page has put together nice information regarding how to do this.

  • 1 Run PROC EXPORT with PUTNAMES=NO
  • 2 Run PROC EXPORT and recall and edit the code
  • 3 Run PROC EXPORT and use a DATA step to rewrite the file without the first row
  • 4 DATA _NULL_ with a PUT statement
  • 5 DATA _NULL_ with a PUT statement, all fields quoted
  • 6 ODS CSV and PROC REPORT with suppressed column headers
  • 7 The %ds2csv SAS Institute utility macro
  • 8 The CSV tagset and the table_headers="NO" option
Run PROC EXPORT with PUTNAMES=NO
Sample program 
proc export data=data_to_export  outfile='C:\data_exported.csv'
        dbms=csv
        replace;
        putnames=no;
run;

Run PROC EXPORT and use a DATA step to rewrite the file without the first row

Sample program

filename exTemp temp;

proc export data=sashelp.class outfile=exTemp dbms=csv;
run;
 
data _null_;
   infile extemp firstobs=2;
   file '.\class.csv';
   input;
   put _infile_;
run;

DATA _NULL_ with a PUT statement

Sample program, use of FILE statement LRECL= may be necessary.

Data _null_;   
   file '.\class.csv' dsd dlm=',';
   set sashelp.class ;
   put (_all_) (+0);
run;

DATA _NULL_ with a PUT statement, all fields quoted

This example uses the ~ format modifier to quote all the fields in the CSV file.
Sample program

Data _null_;   
   file '.\class.csv' dsd dlm=',';
   set sashelp.class ;
   put ( _all_ ) (~);
run;

ODS CSV and PROC REPORT with suppressed column headers

To create CSV files (i.e., this technique won't work for other types of delimited files), ODS CSV can be used with PROC REPORT. The '  ' in the DEFINE statement specifies that no column header text is to be included. Since none of the columns have header text, the header row is suppressed.
Sample program

ods csv file = '\class.csv';
proc report data = sashelp.class nowd;
 define _all_ / display ' ';
run;
ods csv close;

The CSV tagset and the table_headers="NO" option

Sample program by Richard A. DeVenezia
%let output_folder = %sysfunc(pathname(WORK));
 
ods listing close;
ods tagsets.csv 
  file="&output_folder.\class-noheader.csv"
  options(table_headers="no")
;
 
proc print noobs data=sashelp.class;
  where name < 'J';
run;
 
ods tagsets.csv close;
ods listing;
 
Direct Link: http://www.sascommunity.org/wiki/Create_a_CSV_file_without_column_names/headers_in_row_1 
 

Thursday, June 7, 2012

ERROR 29-185: Width Specified for format ---- is invalid








You see "ERROR 29-185: Width Specified for format ----  is invalid" message in the log file  when you try to specify the DATE format but used an invalid width. DATE format will not result in a date if it is too long or too short. Valid values are 5-9 in SAS 9.1.X versions. If you use newer version (SAS 9.2) then you won't see this Error message in the log. ( I am assuming that this is fixed in SAS 9.2).
Try using format date9. instead of date11. if you are using SAS 9.1.x (either Windows or Unix) version.




data _null_;
date ='23-SEP-2004'd;
put date date11.;*This statement gives you error in SAS 9.1.2/9.1.3 versions;
put date date9.;
run;

Saturday, May 19, 2012

My 5 Important reasons to use Proc SQL

• Proc SQL requires few lines of SAS code compared with datastep and or Proc steps
• Frequency counting can be done in no time… which is very helpful during the QC or validation
• Proc SQL can merge datasets together using different variable names unlike datastep.
• Proc SQL can merge many datasets together in the same step on different variables
• Proc SQL allows you to join more than two datasets together at the same time on different levels
• The merge process Proc SQL join does not overlays the duplicate by-column, where the Merge    statement of the data step does.


Data step vs Proc SQL



•  Data step Merge– Pre sorting of the dataset by the by-variable needed before the merging process
– Requires common variable names

–  May need few more lines of code than Proc SQL

•   PROC SQL Join process works different than the typical Data step Merge.....
– Duplicate matching columns won't be getting overlaid
– Can merge more than one dataset together, on different levels (don’t need to merge multiple datasets together using the same variable)





Friday, February 24, 2012

Transcoding Problem: Option (correctencoding=wlatin1)

Have you ever tried to convert the default encoding to Wlatin1 (Windows SAS Session Encoding)?

Let me tell you the story behind writing this post….

Today I was asked to send SAS datasets to one of the client. I transferred the SAS datasets to the client and immediately after, I got an email from the so called client saying the encoding of SAS datasets is different this time when compared with the last transfer. He said It’s causing problems in Proc compare process.

Opps… bummer…. Client’s email got me little worried ...
I checked the Proc contents details and saw the change in the encoding. I investigated the issue and found out that Unicode SAS with UTF-8 encoding uses 1 to 4 bytes to handle Unicode data. It is possible for the 8-bit characters to be expanded by 2 to 3 bytes when transcoding occurs, which causes the truncation error.

Because of the truncation problem I was asked to change the unicoding back to WLATIN1 so that the character data present in the SAS datasets represents the US and Europe characters in windows.

Here is the code to do that.

proc datasets lib=SDTM;
modify supplb/correctencoding=wlatin1;
run;

Unicode Basics:

Unicode is the universal character encoding that supports the interchange, processing, and display of
characters and symbols found in the world’s writing systems. Other character encodings are limited to
subsets of all languages. 

Transcoding problem:

Currently, SAS/ACCESS Interface to ODBC cannot support Unicode. DBCS data and single-byte
non-ASCII characters cannot be correctly processed.

In a SAS UTF-8 session, data imported appear as question marks. Although the encoding property of
imported data is UTF-8, the real encoding of the data is not changed to UTF-8.

Workaround:

/* Set the correct encoding to ‘WLATIN1’ */
proc datasets lib=stagings;
modify custtype_w1/correctencoding=wlatin1;
run;

/* Transcode data from ‘Wlatin1’ to ‘UTF-8’ */
data stagings.custtype_u8;
set stagings.custtype_w1;
run;


Ref:







Sunday, November 13, 2011

SDTM Compliance Checks

Validation checks or tools to check the compliance of SDTM data

JANUS is a standard database model which is based on the CDISC’s SDTM standard. JANUS is used by the FDA to store the submitted SDTM clinical data. As a part of data definition file submission pharmaceutical companies have to submit SAS datasets in transport file (.xpt) format along with annotated CRF and Define.xml file. The reason being this is… to properly load the clinical data into JANUS database which is maintained by the FDA. It is very easy for FDA reviewers to review the clinical data once they load the clinical data into their JANUS database. They can even produce ad-hoc reports and perform cross-study review at the same time. FDA runs compliance checks on the data submitted to make sure the data was collected as per the SDTM standard. FDA checks the compliance of data by running the WebSDM™ developed by PhaseForward). WebSDM™ is a SDTM compliance check validation tool performs a set of SDTM compliance checks on clinical data before it’s gets loaded into the JANUS database.

SDTM VALIDATION TOOLS:

SDTM Validation tools verifies that the clinical data is in compliance with the standards and the assumptions of the SDTM implementation guide v3.1.1 or v3.1.2. They also verifies that the Define file created is in compliance with the ODM v 1.2.
1) LINCOLN TECHNOLOGIES - WebSDM™:

WebSDM™ is an application, tests the compliance of submission-ready files (in SAS V5 Transport format or Oracle® views) according to the SDTM IG. The FDA has been using the WebSDM™ since 2004 to review the clinical data. Users load SDTM-compliant files into WebSDM™ tool which can then check for errors and or inconsistencies in the structure and content of the data.


The checks available include detection of structural and consistency errors rated by severity (high, medium and low). We can get the details of the compliance checks performed by the WebSDM™ from Phase forward website.


It will be very useful for the sponsor to run the WebSDM checks on the clinical data because the FDA reviewers also use the same application to review the data. The one negative point of this application is .. it cannot be used for near SDTM complaint or client specific standard datasets.

2) SAS - Proc CDISC:


SAS provides Proc CDISC to perform SDTM compliance checks on the clinical data. Proc CDISC is already included in the software as a procedure, If you use newer version of SAS (Version 9.1.3 Service Pack 3 and above).


Proc CDISC only supports few domains to run the compliance checks (15 out of the 23 domains outlined in CDISC SDTM version 3.1. It supports the


Interventions (CM, EX, SU),
Events (AE, DS, MH),
Findings (EG, IE, LB, PE, QS, SC, VS), and
Special purpose (DM, CO) class of domains.

Proc CDISC does not support the trial design domains, custom-defined domains or other new SDTM domains like MB, MC, PC and PK etc.

3) OpenCDISC Validator:

OpenCDISC is a new application which has been released by openCDISC to run the compliance checks on the clinical data. OpenCDISC validator is an open source java based project that provides validation of datasets against SDTM datasets. The advantage of this software application is ..It is open software so can be downloaded for free. The second advantage of this application is, It includes the combination of WebSDM and Janus checks. It can also check the ADaM compliance standards.


OpenCDISC can check the compliance of SAS transport as well as delimited files (.CSV etc) as per the SDTM v3.1.1 or SDTM v3.1.2 standards. One major advantage of this tool, this can perform compliance checks on all sorts of datasets (ex: SDTM compliant, SDTm non-compliant).

Here are few other nice SUGI papers that have lots of information regarding .. Open CDISC SDTM compliance checks:

Open CDISC Plus
A Standard SAS® Program for Corroborating OpenCDISC Error Message

Ref:


Study Data Specifications document V 1.6 by FDA.
In-Depth Review of Validation Tools to Check Compliance of CDISC SDTM-Ready Clinical Datasets http://www.lexjansen.com/pharmasug/2010/cd/cd13.pdf
SAS and Open Source Tools for CDISC SDTM Compliance Checks for Regulatory Submissions
http://www.nesug.org/Proceedings/nesug10/ph/ph04.pdf
Validating CDISC SDTM-Compliant Submission-Ready Clinical Datasets with an In-House SAS® Macro-Based Solution http://www.lexjansen.com/pharmasug/2008/rs/rs07.pdf
Max Kanevsky (2008), “Validating SDTM, an open source solution”, proceeding of the CDISC Interchange 2008