Thursday, October 15, 2009

Dummy Dataset or SAS Options: Which is better to insert a Zero Row?

Always, programmers need to summarize the demographics data and show it in a table and to do so they use Proc Freq procedure. Even though proc Freq calculates the Frequency exactly, it may not be the write procedure in all cases especially when data do not exist.

Some times statistician wants to see all the data values on the CRF in the final table, even though there is no combination as such exists in the dataset. In this case we have to insert observations with 0 values.

Here I will present you ….the different methods to insert a zero row.

1) Creating a Dummy Dataset and Concatenate the dummy dataset with the input dataset.
2) Proc Freq SPARSE option
3) Proc Means COMPLETETYPES Option
4) Proc Means COMPLETETYPES Option with PRELOADFMT option.
Dummy Dataset:
Adv: Simple and doesn’t need any formats
Caveat: Programmer has to know all the possible combinations

Sparse Option:
Lists all possible combinations of variable levels even when a combination does not occur.

Syntax:

proc freq data=demo noprint;
table sitec*race /sparse out=freq (drop=percent);
run;

Using SPARSE option in Proc Freq, SAS outputs one record for each possible combination of variables mentioned in tables’ statement.


Adv: Convenient and Simpler.
Dis.Adv: Sometimes CRF has more types than we normally see in dataset. If Statistician want us to keep one record for each type mentioned in the CRF, SPARSE option in the proc freq doesn’t work as expected. Because SAS doesn’t know what other possible combination occurs in the dataset.

Caveat: There must be at least one occurrence of a value for SPARSE to summarize appropriately.

Proc Means using Complete Types Option:
Syntax:

proc means data=demo completetypes noprint nway;
class sitec race;
output out =race(rename=(_freq_=count) drop=_type_);
run;

Adv: Simple and easy to write…..Proc Means with COMPLETETYPES option works similar to Proc Freq SPARSE option.

Caveat: There must be at least one occurrence of a value for COMPLETETYPES option to summarize appropriately.

Proc Means using COMPLETETYPES and the PRELOADFMT option:
PRELOADFMT Option tells SAS to load all the formats (mentioned in the Proc Format procedure for particular variable) in memory before start executing the Proc Means CLASS statement.

One important thing here you should know is about how to use this option.
If you want to use this PRELOADFMT option in the CLASS statemnt, you should also use either of COMPLETETYPES, EXCLUSIVE or ORDER=DATA options.

When you use the PRELOADFMT option in combination with the COMPLETETYPES option, SAS create the output with all the possible combinations even if the combination doesn't seen in the input dataset.


Syntax:

proc format;
VALUE $RACEF
'Asian'=3
'Black'=2
'White'=1
'American Indian or Alaska Native'=4
'Native Hawaiian or Other Pacific Islander'=5;
run;

data demo;
set demo;
format race $racef.;
run;

proc means data=demo completetypes noprint nway;
class sitec race/preloadfmt;
output out =race(rename=(_freq_=count) drop=_type_);
run;
Adv: Simplicity of use
There is no requirement to have at least one occurrence of a value in the data.

Caveat: This method only works if we use formats in combination with our data. You don’t necessarily need to know what the format values are, but we have to make sure formats are assigned to all variables we are trying to summarize.

YOu can use PRELOADFMT option in Proc means , Proc summary and Proc Tabulate.


Example:

data demo ;
input siteid $ sex $ race $ age ;
cards;SITE1 M White 23
SITE1 F White 43
SITE1 M White 34
SITE2 M Black 21
SITE2 M White 56
SITE2 F Black 33
;

run;

proc sort data=demo;
by siteid;

run;


*Without any options in proc freq;
proc freq data=demo noprint;
table siteid*race /out=nooptions (drop=percent);
run;














*With Sparse option in proc freq;

proc freq data=demo noprint;
table siteid*race /sparse out=_sparse (drop=percent);
run;








*With Completetypes option in proc means;
proc means data=demo completetypes noprint nway;
class siteid race;
output out =comptyp(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;
















*With Completetypes and preloadfmt options in proc means;
proc format;
VALUE $RACEF
'Asian'='Asian'
'Black'='Black'
'White'='White';

run;


data demo;
set demo;
format race $racef.;
run;

proc means data=demo completetypes noprint nway;
class siteid race/preloadfmt;
output out =race(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;

Output:














With PRELOADFMT in the CLASS statement and COMPLETETYPES option in the PROC MEANS statement, SAS will include all the possible combinations of classification variables in the output as well as zero rows (0 observations).

Wednesday, October 7, 2009

Sunday, September 20, 2009

Case Report Tabulations for The FDA Submission

The Case Report Tabulation (CRT) is the collection of the annotated case report form (CRF), SAS® datasets, metadata, and source programs that comprise a portion of the NDA package submitted to the FDA. The FDA uses it when reviewing submissions. Review starts with the Define document which contains metadata describing the datasets, variables, and values. It is all tied together using internal and external hyperlinks, bookmarks, and destinations to make it easily navigable1.

The CRT is essentially a collection of data and documentation for a study. It contains features such as bookmarks and links to allow reviewers to easily navigate the submission. For consistency, there are guidelines from FDA1 and CDISC defining the components though the guidelines are limited in scope.

We need to create Define Document (define.pdf or define.xml) as a part of CRT.
  • Each dataset is a single SAS transport file and, in general, includes a combination of raw and derived data.

  • Each CRF domain (e.g., demographics, vital signs, adverse events) should be provided as a single dataset.

  • In addition, datasets suitable for reproducing and confirming analyses may also be needed.

  • Patient profiles can also be provided as PDF files


    In 2003, FDA …. Interpreted….. 21 CFR 314.50(f) (1) as defining CRTs to include2:


  • Study Data Tabulations
    Statistical Analysis Datasets
    Data Listings
    Patient Profiles

    Draft eCTD Guidance: Case Report Tabulations

    Data tabulations ___Data tabulations datasets ___Data definitions

  • Data listings ___Data listing datasets ___Data definitions

  • Analysis datasets __Analysis datasets ___Analysis programs ___Data definitions

  • Subject profiles

  • IND safety reports

Data Tabulations:

  • Data tabulations are datasets in which each record is a single observation for a subject.”

  • Specifications are located in the Study Data Tabulation Model (SDTM) developed by CDISC at www.cdisc.org/models/sds/v3.1/index.html. *Each dataset is provided as a SAS Transport (XPORT) file.
Data Listings:

  • “Data listings are datasets in which each record is a series of observations collected for each subject during a study or for each subject for each visit during the study organized by domain.”


  • Currently, there are no further specifications for organizing data listing datasets.


  • General information about creating datasets can be found in the SDTM implementation guides referenced in the data tabulation dataset specifications.


  • Each dataset is provided as a SAS Transport (XPORT) file.
Analysis datasets:

  • “Analysis datasets are datasets created to support specific analyses. Programs are scripts used with selected software to produce reported analyses based on these datasets.”


  • Each dataset is provided as a SAS Transport (XPORT) file.


  • Programs should be provided as both ASCII text and PDF files and should include sufficient documentation to allow a reviewer to understand the submitted programs.


  • It is not necessary to provide analysis datasets and programs that will enable the reviewer to directly reproduce reported results using agency hardware and software. Currently, there are no other additional specifications for creating analysis datasets.
Subject Profiles:
  • “Subject profiles are displays of study data of various modalities collected for an individual subject and organized by time.”

  • Each individual patient’s complete patient profile is in a single PDF file or a book-marked section of a single PDF file for all patients.


References: http://www.lexjansen.com/pharmasug/2009/rs/rs08.pdf
http://www.amstat.org/meetings/fdaworkshop/presentations/2005/P07_Christiansen_CDISC.ppt http://www.cdisc.org/stuff/contentmgr/files/0/f56015f6c1c01e6aa55767d9d25bddb5/misc/officeofbusinessprocesssupport.pdf

DLP (Data LifeCycle Plan)


The DLP (Data LifeCycle Plan) guides an organization and serves as a blueprint for how to create every type of data across all therapeutic areas and functional specialties. Howard describes the DLP as "an overall document that says here are the things you need to think about." In some DLPs, there might be more than 15 chapters, each controlled by a group of domain experts.


Standard operating procedures (SOPs) typically cover process. DLPs, in contrast, are technical specifications about what happens to the data. Both SOPs and DLPs should be subject to similar governance. The DLP creates a framework for discussions that do occur on their own, but it forces them to an earlier stage of the process.



Here's a sample chapter of a DLP for demographic data from Kestrel Consultants, Inc.

The Future of ODM, SDTM and CDISC


Setting The Record Straight: The Future of ODM, SDTM and CDISC
November 05, 2008
By Rebecca Kush, Frank Newby, David Iberson-Hurst, & Amanda J de Montjoie, CDISC


When rumors increase, and when there is an abundance of noise and clamor, believe the second report. Alexander Pope, 1688–1744


In recent weeks, there have been a number of rumors about the Clinical Data Interchange Standards Consortium CDISC standards and their continued use by the Food and Drug Administration (FDA). These whispers suggest that the Study Data Tabulation Model (SDTM) standard is being replaced and that the Operational Data Model (ODM) should no longer be used. The rumors have caused confusion at best and fear of standards abandonment at worst. This is a second report—an attempt to clarify what’s happened and how the industry should move forward.


CDISC, to state the obvious, is a global standards organization. While the FDA plays a vital role in the drug development industry in the U.S., there are other global stakeholders that use CDISC standards. CDISC’s influence extends as far east as Japan, Australia and China—and is well established in Europe.


An International Agenda

Just as biopharmaceutical communities are expanding globally, CDISC is being asked to meet with regulatory authorities in China and with academic representatives in Singapore, India and Brazil. CDISC has Liaison A status with the International Standards Organization Technical Committee 215 and works closely on global standards harmonization with CEN, ISO and HL7 as a member of the Joint Initiative Council.


Friday, September 18, 2009

Define.XML

Define.xml which generally describes what are we submitting to FDA .. like case Report Tabulations ( SAS datasets in XPT format, annotated CRF’s and the variables (metadata)) in a machine readable format….. In simple words it is .....Data and Metadata in Machine-Readable Format. It is considered as the standard process for submitting metadata to FDA and other regulatory bodies….