Sunday, October 18, 2009
Saturday, October 17, 2009
SAS Tip_less code: Assigning 1 or 0 to flag variable
*Creating a flag variable when a test variable meets certain criteria is very common for SAS programmer….
Many SAS programmers use the below code to assign a flag of 1 or 0 depending on of the test variable meets criteria or not.;
*Ex:;
*Create a test dataset;
data test;
input id age sex $;
cards;
1 25 Male
2 35 Female
3 29 Female
4 37 Male
5 32 Male
;
run;
*Most programmers use the following code to assign avalue of 1 0r 0 to flag variable;
data test1;
set test;
if sex='Male' then flag=1;
else flag=0;
run;
*Some programmers use the following code to do the same task;
data test;
set test;
flag=ifn(sex='Male',1,0);
run;
Many SAS programmers use the below code to assign a flag of 1 or 0 depending on of the test variable meets criteria or not.;
*Ex:;
*Create a test dataset;
data test;
input id age sex $;
cards;
1 25 Male
2 35 Female
3 29 Female
4 37 Male
5 32 Male
;
run;
*Most programmers use the following code to assign avalue of 1 0r 0 to flag variable;
data test1;
set test;
if sex='Male' then flag=1;
else flag=0;
run;
*Some programmers use the following code to do the same task;
data test;
set test;
flag=ifn(sex='Male',1,0);
run;
*You can write ....even simpler code than the above 2 dataset step methods.;
data test2;
set test;
flag='Male'=sex;
run;
*Or;
data test3;
set test;
flag=sex='Male';
run;
*Note: The above code does the same thing as the 1st and 2nd method;
Caveat: This code works only when you are trying to assign a value of 1 and 0 to test variable;
Thursday, October 15, 2009
Dummy Dataset or SAS Options: Which is better to insert a Zero Row?
Always, programmers need to summarize the demographics data and show it in a table and to do so they use Proc Freq procedure. Even though proc Freq calculates the Frequency exactly, it may not be the write procedure in all cases especially when data do not exist.
Some times statistician wants to see all the data values on the CRF in the final table, even though there is no combination as such exists in the dataset. In this case we have to insert observations with 0 values.
Here I will present you ….the different methods to insert a zero row.
1) Creating a Dummy Dataset and Concatenate the dummy dataset with the input dataset.
2) Proc Freq SPARSE option
3) Proc Means COMPLETETYPES Option
4) Proc Means COMPLETETYPES Option with PRELOADFMT option.
2) Proc Freq SPARSE option
3) Proc Means COMPLETETYPES Option
4) Proc Means COMPLETETYPES Option with PRELOADFMT option.
Dummy Dataset:
Adv: Simple and doesn’t need any formats
Caveat: Programmer has to know all the possible combinations
Sparse Option:
Lists all possible combinations of variable levels even when a combination does not occur.
Syntax:
proc freq data=demo noprint;
table sitec*race /sparse out=freq (drop=percent);
run;
table sitec*race /sparse out=freq (drop=percent);
run;
Using SPARSE option in Proc Freq, SAS outputs one record for each possible combination of variables mentioned in tables’ statement.
Adv: Convenient and Simpler.
Dis.Adv: Sometimes CRF has more types than we normally see in dataset. If Statistician want us to keep one record for each type mentioned in the CRF, SPARSE option in the proc freq doesn’t work as expected. Because SAS doesn’t know what other possible combination occurs in the dataset.
Caveat: There must be at least one occurrence of a value for SPARSE to summarize appropriately.
Proc Means using Complete Types Option:
Syntax:
proc means data=demo completetypes noprint nway;
class sitec race;
output out =race(rename=(_freq_=count) drop=_type_);run;
class sitec race;
output out =race(rename=(_freq_=count) drop=_type_);run;
Adv: Simple and easy to write…..Proc Means with COMPLETETYPES option works similar to Proc Freq SPARSE option.
Caveat: There must be at least one occurrence of a value for COMPLETETYPES option to summarize appropriately.
Proc Means using COMPLETETYPES and the PRELOADFMT option:
PRELOADFMT Option tells SAS to load all the formats (mentioned in the Proc Format procedure for particular variable) in memory before start executing the Proc Means CLASS statement.
One important thing here you should know is about how to use this option.
If you want to use this PRELOADFMT option in the CLASS statemnt, you should also use either of COMPLETETYPES, EXCLUSIVE or ORDER=DATA options.
When you use the PRELOADFMT option in combination with the COMPLETETYPES option, SAS create the output with all the possible combinations even if the combination doesn't seen in the input dataset.
Syntax:
proc format;
VALUE $RACEF
'Asian'=3
'Black'=2
'White'=1
'American Indian or Alaska Native'=4
'Native Hawaiian or Other Pacific Islander'=5;
run;
With PRELOADFMT in the CLASS statement and COMPLETETYPES option in the PROC MEANS statement, SAS will include all the possible combinations of classification variables in the output as well as zero rows (0 observations).VALUE $RACEF
'Asian'=3
'Black'=2
'White'=1
'American Indian or Alaska Native'=4
'Native Hawaiian or Other Pacific Islander'=5;
run;
data demo;
set demo;
format race $racef.;
run;
proc means data=demo completetypes noprint nway;
class sitec race/preloadfmt;
output out =race(rename=(_freq_=count) drop=_type_);
run;
proc sort data=demo;
by siteid;
run;
*Without any options in proc freq;
proc freq data=demo noprint;
table siteid*race /out=nooptions (drop=percent);
run;
*With Completetypes and preloadfmt options in proc means;
proc format;
VALUE $RACEF
'Asian'='Asian'
'Black'='Black'
'White'='White';
run;
data demo;
set demo;
format race $racef.;
run;
proc means data=demo completetypes noprint nway;
class siteid race/preloadfmt;
output out =race(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;
Output:
set demo;
format race $racef.;
run;
proc means data=demo completetypes noprint nway;
class sitec race/preloadfmt;
output out =race(rename=(_freq_=count) drop=_type_);
run;
Adv: Simplicity of use
There is no requirement to have at least one occurrence of a value in the data.
Caveat: This method only works if we use formats in combination with our data. You don’t necessarily need to know what the format values are, but we have to make sure formats are assigned to all variables we are trying to summarize.
YOu can use PRELOADFMT option in Proc means , Proc summary and Proc Tabulate.
Example:
data demo ;
input siteid $ sex $ race $ age ;
cards;SITE1 M White 23
SITE1 F White 43
SITE1 M White 34
SITE2 M Black 21
SITE2 M White 56
SITE2 F Black 33;
run;
input siteid $ sex $ race $ age ;
cards;SITE1 M White 23
SITE1 F White 43
SITE1 M White 34
SITE2 M Black 21
SITE2 M White 56
SITE2 F Black 33;
run;
proc sort data=demo;
by siteid;
run;
*Without any options in proc freq;
proc freq data=demo noprint;
table siteid*race /out=nooptions (drop=percent);
run;
*With Sparse option in proc freq;
*With Completetypes option in proc means;
proc means data=demo completetypes noprint nway;
class siteid race;
output out =comptyp(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;
proc means data=demo completetypes noprint nway;
class siteid race;
output out =comptyp(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;
*With Completetypes and preloadfmt options in proc means;
proc format;
VALUE $RACEF
'Asian'='Asian'
'Black'='Black'
'White'='White';
run;
data demo;
set demo;
format race $racef.;
run;
proc means data=demo completetypes noprint nway;
class siteid race/preloadfmt;
output out =race(where=(_stat_='N')rename=(_freq_=count) keep=siteid race _freq_ _stat_);
run;
Output:
Wednesday, October 7, 2009
Saturday, October 3, 2009
Sunday, September 20, 2009
Case Report Tabulations for The FDA Submission
The Case Report Tabulation (CRT) is the collection of the annotated case report form (CRF), SAS® datasets, metadata, and source programs that comprise a portion of the NDA package submitted to the FDA. The FDA uses it when reviewing submissions. Review starts with the Define document which contains metadata describing the datasets, variables, and values. It is all tied together using internal and external hyperlinks, bookmarks, and destinations to make it easily navigable1.
The CRT is essentially a collection of data and documentation for a study. It contains features such as bookmarks and links to allow reviewers to easily navigate the submission. For consistency, there are guidelines from FDA1 and CDISC defining the components though the guidelines are limited in scope.
We need to create Define Document (define.pdf or define.xml) as a part of CRT.
The CRT is essentially a collection of data and documentation for a study. It contains features such as bookmarks and links to allow reviewers to easily navigate the submission. For consistency, there are guidelines from FDA1 and CDISC defining the components though the guidelines are limited in scope.
We need to create Define Document (define.pdf or define.xml) as a part of CRT.
- Each dataset is a single SAS transport file and, in general, includes a combination of raw and derived data.
- Each CRF domain (e.g., demographics, vital signs, adverse events) should be provided as a single dataset.
- In addition, datasets suitable for reproducing and confirming analyses may also be needed.
- Patient profiles can also be provided as PDF files
In 2003, FDA …. Interpreted….. 21 CFR 314.50(f) (1) as defining CRTs to include2:
Study Data Tabulations
Statistical Analysis Datasets
Data Listings
Patient Profiles
Draft eCTD Guidance: Case Report Tabulations
Data tabulations ___Data tabulations datasets ___Data definitions- Data listings ___Data listing datasets ___Data definitions
- Analysis datasets __Analysis datasets ___Analysis programs ___Data definitions
- Subject profiles
- IND safety reports
Data Tabulations:
- Data tabulations are datasets in which each record is a single observation for a subject.”
- Specifications are located in the Study Data Tabulation Model (SDTM) developed by CDISC at www.cdisc.org/models/sds/v3.1/index.html. *Each dataset is provided as a SAS Transport (XPORT) file.
- “Data listings are datasets in which each record is a series of observations collected for each subject during a study or for each subject for each visit during the study organized by domain.”
- Currently, there are no further specifications for organizing data listing datasets.
- General information about creating datasets can be found in the SDTM implementation guides referenced in the data tabulation dataset specifications.
- Each dataset is provided as a SAS Transport (XPORT) file.
- “Analysis datasets are datasets created to support specific analyses. Programs are scripts used with selected software to produce reported analyses based on these datasets.”
- Each dataset is provided as a SAS Transport (XPORT) file.
- Programs should be provided as both ASCII text and PDF files and should include sufficient documentation to allow a reviewer to understand the submitted programs.
- It is not necessary to provide analysis datasets and programs that will enable the reviewer to directly reproduce reported results using agency hardware and software. Currently, there are no other additional specifications for creating analysis datasets.
- “Subject profiles are displays of study data of various modalities collected for an individual subject and organized by time.”
- Each individual patient’s complete patient profile is in a single PDF file or a book-marked section of a single PDF file for all patients.
References: http://www.lexjansen.com/pharmasug/2009/rs/rs08.pdf
http://www.amstat.org/meetings/fdaworkshop/presentations/2005/P07_Christiansen_CDISC.ppt http://www.cdisc.org/stuff/contentmgr/files/0/f56015f6c1c01e6aa55767d9d25bddb5/misc/officeofbusinessprocesssupport.pdf
DLP (Data LifeCycle Plan)
The DLP (Data LifeCycle Plan) guides an organization and serves as a blueprint for how to create every type of data across all therapeutic areas and functional specialties. Howard describes the DLP as "an overall document that says here are the things you need to think about." In some DLPs, there might be more than 15 chapters, each controlled by a group of domain experts.
Standard operating procedures (SOPs) typically cover process. DLPs, in contrast, are technical specifications about what happens to the data. Both SOPs and DLPs should be subject to similar governance. The DLP creates a framework for discussions that do occur on their own, but it forces them to an earlier stage of the process.
Here's a sample chapter of a DLP for demographic data from Kestrel Consultants, Inc.
Subscribe to:
Posts (Atom)