IMPLEMENTATION OF CDISC STANDARDS
Presented By Sandeep Raj Juneja, ASG Inc....
CDISC accomplishments and Strategy
CDISC and Standards for Clinical Research
by Rebecca D.Kush, Ph.D, Founder & President,CDISC
CDISC SDTM and related initativies
CDISC submission standard : CDISC SDTM_Basics
Supporting The CDISC Standards
By Mark Lambrecht,PhD, Principal Consultant,Life Sciences,SAS
Case Report Tabulation Data Definition Specification (define.xml)
CDISC Study Data Tabulation Model SDTM Implementation Guide V3.1.1
http://www.cdisc.org/models/sdtm/v1.1/index.html
Clinical Data Integration:
SAS Clinical Data Integration
By Dave Smith, SAS UK
Industry Standards for the electronic submission of Data to the FDA
by Michael A.Walega
CDISC SDTM Basics
Friday, October 24, 2008
Tuesday, September 30, 2008
LEARN SAS within 7 weeks:
LEARN SAS within 7 weeks: Part1
LEARN SAS within 7 weeks: Part2 (Introduction to SAS – The Data Step)
LEARN SAS within 7 weeks: Part3 (Introduction to SAS – SET, MERGE, and Multiple Operations)
LEARN SAS within 7 weeks: Part4 (More on Manipulating Data)
LEARN SAS within 7 weeks: Part5 (Procedures to Summarize Data)
LEARN SAS within 7 weeks: Part6 (Producing Graphics and Using SAS Analyst)
LEARN SAS within 7 weeks: Part2 (Introduction to SAS – The Data Step)
LEARN SAS within 7 weeks: Part3 (Introduction to SAS – SET, MERGE, and Multiple Operations)
LEARN SAS within 7 weeks: Part4 (More on Manipulating Data)
LEARN SAS within 7 weeks: Part5 (Procedures to Summarize Data)
LEARN SAS within 7 weeks: Part6 (Producing Graphics and Using SAS Analyst)
Tuesday, September 23, 2008
SAS Interview Questions and Answers: CDISC, SDTM and ADAM etc
1) What do you know about CDISC and its standards?
CDISC stands for Clinical Data Interchange Standards Consortium and it is developed keeping in mind to bring great deal of efficiency in the entire drug development process. CDISC brings efficiency to the entire drug development process by improving the data quality and speed-up the whole drug development process and to do that CDISC developed a series of standards, which include Operation data Model (ODM), Study data Tabulation Model (SDTM) and the Analysis Data Model ADaM).
2) Why people these days are more talking about CDSIC and what advantages it brings to the Pharmaceutical Industry?
A) Generally speaking, Only about 30% of programming time is used to generate statistical results with SAS®, and the rest of programming time is used to familiarize data structure, check data accuracy, and tabulate/list raw data and statistical results into certain formats. This non-statistical programming time will be significantly reduced after implementing the CDISC standards.
3) What are the challenges as SAS programmer you think you will face when you first implement CDISC standards in you company?
A) With the new requirements of electronic submission, CRT datasets need to conform to a set of standards for facilitating reviewing process. They no longer are created solely for programmers convenient. SDS will be treated as specifications of datasets to be submitted, potentially as reference of CRF design. Therefore, statistical programming may need to start from this common ground. All existing programs/macros may also need to be remapped based on CDISC so one can take advantage to validate submission information by using tools which reviewer may use for reviewing and to accelerate reviewing process without providing unnecessary data, tables and listings. With the new requirements from updating electronic submission and CDISC implementation, understanding only SAS® may not be good enough to fulfill for final deliverables. It is a time to expand and enhance the job skills from various aspects under new change so that SAS® programmers can take a competitive advantage, and continue to play a main role in both statistical analysis and reporting for drug development.
References:
Pharmasug/2007/fc/fc05
pharmasug/2003/fda compliance/fda055
1) What do you understand about SDTM and its importance?
SDTM stands for Standard data Tabulation Model, which defines a standard structure for study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA) 2.
In July 2004 the Clinical Data Interchange Standards Consortium (CDISC) published standards on the design and content of clinical trial tabulation data sets, known as the Study Data Tabulation Model (SDTM). According to the CDISC standard, there are four ways to represent a subject in a clinical study: tabulations, data listings, analysis datasets, and subject profiles6.
Before SDTM:
There are different names for each domain and domains don’t have a standard structure. There is no standard variables list for each and every domain.
Because of this FDA reviewers always had to take so much pain in understanding themselves with different data, domain names and name of the variable in each analysis dataset. Reviewers will have spent most of the valuable time in cleaning up the data into a standard format rather than reviewing the data for the accuracy. This process will delay the drug development process as such.
After SDTM:
There will be standard domain names and standard structure for each domain. There will be a list of standard variables and names for each and every dataset. Because of this, it will become easy to find and understand the data and reviewers will need less time to review the data than the data without SDTM standards. This process will improve the consistency in reviewing the data and it can be time efficient.
The purpose of creating SDTM domain data sets is to provide Case Report Tabulation (CRT) data FDA, in a standardized format. If we follow these standards it can greatly reduce the effort necessary for data mapping. Improper use of CDISC standards, such as using a valid domain or variable name incorrectly, can slow the metadata mapping process and should be avoided4.
2) PROC CDISC for SDTM 3.1 Format 2?
Syntax The PROC CDISC syntax for CDISC SDTM is presented below. The DATA= parameter specifies the location of your SDTM conforming data source.PROC CDISC MODEL=SDTM;SDTM SDTMVersion = "3.1";DOMAINDATA DATA = results. AE DOMAIN = AE CATEGORY = EVENT;RUN;
3) What are the capabilities of PROC CDISC 2?
PROC CDISC performs the following checks on domain content of the source:
Verifies that all required variables are present in the data set
Reports as an error any variables in the data set that are not defined in the domain
Reports a warning for any expected domain variables that are not in the data set
Notes any permitted domain variables that are not in the data set
Verifies that all domain variables are of the expected data type and proper length
Detects any domain variables that are assigned a controlled terminology specification by the domain and do not have a format assigned to them.
The procedure also performs the following checks on domain data content of the source on a per observation basis:
Verifies that all required variable fields do not contain missing values
Detects occurrences of expected variable fields that contain missing values
Detects the conformance of all ISO-8601 specification assigned values; including date, time, date time, duration, and interval types
Notes correctness of yes/no and yes/no/null responses,
4) What are the different approaches for creating the SDTM 3?
There are 3 general approaches to create the SDTM datasets:
a) Build the SDTM entirely in the CDMS,
b) Build the SDTM entirely on the “back-end” in SAS,
c) or take a hybrid approach and build the SDTM partially in the CDMS and partially in SAS.
BUILD THE SDTM ENTIRELY IN THE CDMS
It is possible to build the SDTM entirely within the CDMS. If the CDMS allows for broad structural control of the underlying database, then you could build your eCRF or CRF based clinical database to SDTM standards.
Advantages:
• Your “raw” database is equivalent to your SDTM which provides the most elegant solution.
• Your clinical data management staff will be able to converse with end-users/sponsors about the data easily since your clinical data manager and the und-user/sponsor will both be looking at SDTM datasets.
• As soon as the CDMS database is built, the SDTM datasets are available.
Disadvantages:
• This approach may be cost prohibitive. Forcing the CDMS to create the SDTM structures may simply be too cumbersome to do efficiently.
• Forcing the CDMS to adapt to the SDTM may cause problems with the operation of the CDMS which could reduce data quality.
BUILD THE SDTM ENTIRELY ON THE “BACK-END” IN SAS
Assuming that SAS is not your CDMS solution, another approach is to take the clinical data from your CDMS and manipulate it into the SDTM with SAS programming.
Advantages:
• The great flexibility of SAS will let you transform any proprietary CDMS structure into the SDTM. You do not have to work around the rigid constraints of the CDMS.
• Changes could be made to the SDTM conversion without disturbing clinical data management processes.
• The CDMS is allowed to do what it does best which is to enter, manage, and clean data.
Disadvantages: • There would be additional cost to transform the data from your typical CDMS structure into the SDTM.
Specifications, programming, and validation of the SAS programming transformation would be required.
• Once the CDMS database is up, there would then be a subsequent delay while the SDTM is created in SAS.
This delay would slow down the production of analysis datasets and reporting. This assumes that you follow the linear progression of CDMS -> SDTM -> analysis datasets (ADaM).
• Since the SDTM is a derivation of the “raw” data, there could be errors in translation from the “raw” CDMS data to the SDTM.
• Your clinical data management staff may be at a disadvantage when speaking with end-users/sponsors about the data since the data manager will likely be looking at the CDMS data and the sponsor will see SDTM data.
BUILD THE SDTM USING A HYBRID APPROACH
Again, assuming that SAS is not your CDMS solution, you could build some of the SDTM within the confines of the CDMS and do the rest of the work in SAS. There are things that could be done easily in the CDMS such as naming data tables the same as SDTM domains, using SDTM variable names in the CTMS, and performing simple derivations (such as age) in the CDMS. More complex SDTM derivations and manipulations can then be performed in SAS.
Advantages:
• The changes to the CDMS are easy to implement.
• The SDTM conversions to be done in SAS are manageable and much can be automated.
Disadvantages:
• There would still be some additional cost needed to transform the data from the SDTM-like CDMS structure into the SDTM. Specifications, programming, and validation of the transformation would be required.
• There would be some delay while the SDTM-like CDMS data is converted to the SDTM.
• Your clinical data management staff may still have a slight disadvantage when speaking with endusers/ sponsors about the data since the clinical data manager will be looking at the SDTM-like data and the sponsor will see the true SDTM data.
5) What do you know about SDTM domains?
A basic understanding of the SDTM domains, their structure and their interrelations is vital to determining which domains you need to create and in assessing the level to which your existing data is compliant. The SDTM consists of a set of clinical data file specifications and underlying guidelines. These different file structures are referred to as domains. Each domain is designed to contain a particular type of data associated with clinical trials, such as demographics, vital signs or adverse events.
The CDISC SDTM Implementation Guide provides specifications for 30 domains. The SDTM domains are divided into six classes.
The 21 clinical data domains are contained in three of these classes:
Interventions,
Events and
Findings.
The trial design class contains seven domains and the special-purpose class contains two domains (Demographics and Comments).
The trial design domains provide the reviewer with information on the criteria, structure and scheduled events of a clinical trail. The only required domain is demographics.
There are two other special purpose relationship data sets, the Supplemental Qualifiers (SUPPQUAL) data set and the Relate Records (RELREC) data set. SUPPQUAL is a highly normalized data set that allows you to store virtually any type of information related to one of the domain data sets. SUPPQUAL domain also accommodates variables longer than 200, the Ist 200 characters should be stored in the domain variable and the remaining should be stored in it5.
6) What are the general guidelines to SDTM variables?
Each of the SDTM domains has a collection of variables associated with it.
There are five roles that a variable can have:
Identifier,
Topic,
Timing,
Qualifier,
and for trial design domains,
Rule. Using lab data as an example, the subject ID, domain ID and sequence (e.g. visit) are identifiers.
The name of the lab parameter is the topic,
the date and time of sample collection are timing variables,
the result is a result qualifier and the variable containing the units is a variable qualifier.
Variables that are common across domains include the basic identifiers study ID (STUDYID), a two-character domain ID (DOMAIN) and unique subject ID (USUBJID).
In studies with multiple sites that are allowed to assign their own subject identifiers, the site ID and the subject ID must be combined to form USUBJID.
Prefixing a standard variable name fragment with the two-character domain ID generally forms all other variable names.
The SDTM specifications do not require all of the variables associated with a domain to be included in a submission. In regard to complying with the SDTM standards, the implementation guide specifies each variable as being included in one of three categories:
Required, Expected, and Permitted4.
REQUIRED – These variables are necessary for the proper functioning of standard software tools used by reviewers. They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set structure; however it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
7) Can you tell me more About SDTM Domains5?
SDTM Domains are grouped by classes, which is useful for producing more meaningful relational schemas. Consider the following domain classes and their respective domains.
• Special Purpose Class – Pertains to unique domains concerning detailed information about the subjects in a study.
Demography (DM), Comments (CO)
• Findings Class – Collected information resulting from a planned evaluation to address specific questions about the subject, such as whether a subject is suitable to participate or continue in a study.
Electrocardiogram (EG)
Inclusion / Exclusion (IE)
Lab Results (LB)
Physical Examination (PE)
Questionnaire (QS)
Subject Characteristics (SC)
Vital Signs (VS)
• Events Class – Incidents independent of the study that happen to the subject during the lifetime of the study.
Adverse Events (AE)
Patient Disposition (DS)
Medical History (MH)
• Interventions Class – Treatments and procedures that are intentionally administered to the subject, such as treatment coincident with the study period, per protocol, or self-administered (e.g., alcohol and tobacco use).
Concomitant Medications (CM)
Exposure to Treatment Drug (EX)
Substance Usage (SU)
• Trial Design Class – Information about the design of the clinical trial (e.g., crossover trial, treatment arms) including information about the subjects with respect to treatment and visits.
Subject Elements (SE)
Subject Visits (SV)
Trial Arms (TA)
Trial Elements (TE)
Trial Inclusion / Exclusion Criteria (TI)
Trial Visits (TV)
7) Can you tell me how to do the Mapping for existing Domains?
First step is the comparison of metadata with the SDTM domain metadata. If the data getting from the data management is in somewhat compliance to SDTM metadata, use automated mapping as the Ist step.
If the data management metadata is not in compliance with SDTM then avoid auto mapping. So do manual mapping the datasets to SDTM datasets and the mapping each variable to appropriate domain.
The whole process of mapping include: *Read in the corporate data standards into a database table.
• Assign a CDISC domain prefix to each database module.
• Attach a combo box containing the SDTM variable for the selected domain to a new mapping variable field.
• Search each module, and within each module select the most appropriate CDISC variable.
•Then search for variables mapped to the wrong type Character not equal to Character; Numeric not equal to Numeric.
• Review the mapping to see if any conflicts are resolvable by mapping to a more appropriate variable.
• We need to verify that the mapped variable is appropriate for each role.
• Then finally we have to ensure all ‘required’ variables are present in the domain6.
8) What do you know about SDTM Implementation Guide, Have you used it, if you have can you tell me which version you have used so far?
SDTM Implementation guide provides documentation on metadata (data of data) for the domain datasets that includes filename, variable names, type of variables and its labels etc. I have used SDTM implementation guide versions 3.1.1/3.1.2
9) Can you identify which variables should we have to include in each domain?
A) SDTM implementation guide V 3.1.1/V 3.1.2 specifies each variable is being included in one of the 3 types.
REQUIRED –They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set; however it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
10) Can you give some examples for MAPPING *6?
Here are some examples for SDTM mapping:
• Character variables defined as Numeric
• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Core SDTM is a subset of the existing corporate standards
• Vertical versus Horizontal structure, (e.g. Vitals)
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
10) Explain the Process of SDTM Mapping?
A list of basic variable mappings is given below *4.
DIRECT: a CDM variable is copied directly to a domain variable without any changes other than assigning the CDISC standard label.
RENAME: only the variable name and label may change but the contents remain the same.
STANDARDIZE: mapping reported values to standard units or standard terminology
REFORMAT: the actual value being represented does not change, only the format in which is stored changes, such as converting a SAS date to an ISO8601 format character string.
COMBINING: directly combining two or more CDM variables to form a single SDTM variable.
SPLITTING: a CDM variable is divided into two or more SDTM variables.
DERIVATION: creating a domain variable based on a computation, algorithm, series of logic rules or decoding using one or more CDM variables.
11) What are the Common Issues in Mapping Dummy corporate standards to CDISC (SDTM) Standards?
• Character variables defined as Numeric
• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Dictionary codes not in SDTM parent module, so if needed must be collected in SUPPQUAL.
• Core SDTM is a subset of the existing corporate standards
• Different structure of Lab CDISC Domain e.g. baseline flag.
• Vertical versus Horizontal structure, (e.g. Vitals)
• Additional Metadata needed to describe the source in SUPPQUAL
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
Ref: Mapping Corporate Data Standards to the CDISC Model (SAS Paper) by David Parker, AstraZeneca, Manchester, United Kingdom
The Analysis Data Model describes the general structure, metadata, and content typically found in Analysis Datasets and accompanying documentation. The three types of metadata associated with analysis datasets (analysis dataset metadata, analysis variable metadata, and analysis results metadata) are described and examples provided. (source:CDISC Analysis Data Model: Version 2.0)
Analysis datasets (AD) are typically developed from the collected clinical trial data and used to create statistical summaries of efficacy and safety data. These AD’s are characterized by the creation of derived analysis variables and/or records. These derived data may represent a statistical calculation of an important outcome measure, such as change from baseline, or may represent the last observation for a subject while under therapy. As such, these datasets are one of the types of data sent to the regulatory agency such as FDA.
The CDISC Analysis Data Model (ADaM) defines a standard for Analysis Dataset’s to be submitted to the regulatory agency. This provides a clear content, source, and quality of the datasets submitted in support of the statistical analysis performed by the sponsor.
In ADaM, the descriptions of the AD’s build on the nomenclature of the SDTM with the addition of attributes, variables and data structures needed for statistical analyses. To achieve the principle of clear and unambiguous communication relies on clear AD documentation. This documentation provides the link between the general description of the analysis found in the protocol or statistical analysis plan and the source data.
12) Can you explain AdaM or AdaM datasets *7?
CDISC stands for Clinical Data Interchange Standards Consortium and it is developed keeping in mind to bring great deal of efficiency in the entire drug development process. CDISC brings efficiency to the entire drug development process by improving the data quality and speed-up the whole drug development process and to do that CDISC developed a series of standards, which include Operation data Model (ODM), Study data Tabulation Model (SDTM) and the Analysis Data Model ADaM).
2) Why people these days are more talking about CDSIC and what advantages it brings to the Pharmaceutical Industry?
A) Generally speaking, Only about 30% of programming time is used to generate statistical results with SAS®, and the rest of programming time is used to familiarize data structure, check data accuracy, and tabulate/list raw data and statistical results into certain formats. This non-statistical programming time will be significantly reduced after implementing the CDISC standards.
3) What are the challenges as SAS programmer you think you will face when you first implement CDISC standards in you company?
A) With the new requirements of electronic submission, CRT datasets need to conform to a set of standards for facilitating reviewing process. They no longer are created solely for programmers convenient. SDS will be treated as specifications of datasets to be submitted, potentially as reference of CRF design. Therefore, statistical programming may need to start from this common ground. All existing programs/macros may also need to be remapped based on CDISC so one can take advantage to validate submission information by using tools which reviewer may use for reviewing and to accelerate reviewing process without providing unnecessary data, tables and listings. With the new requirements from updating electronic submission and CDISC implementation, understanding only SAS® may not be good enough to fulfill for final deliverables. It is a time to expand and enhance the job skills from various aspects under new change so that SAS® programmers can take a competitive advantage, and continue to play a main role in both statistical analysis and reporting for drug development.
References:
Pharmasug/2007/fc/fc05
pharmasug/2003/fda compliance/fda055
1) What do you understand about SDTM and its importance?
SDTM stands for Standard data Tabulation Model, which defines a standard structure for study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA) 2.
In July 2004 the Clinical Data Interchange Standards Consortium (CDISC) published standards on the design and content of clinical trial tabulation data sets, known as the Study Data Tabulation Model (SDTM). According to the CDISC standard, there are four ways to represent a subject in a clinical study: tabulations, data listings, analysis datasets, and subject profiles6.
Before SDTM:
There are different names for each domain and domains don’t have a standard structure. There is no standard variables list for each and every domain.
Because of this FDA reviewers always had to take so much pain in understanding themselves with different data, domain names and name of the variable in each analysis dataset. Reviewers will have spent most of the valuable time in cleaning up the data into a standard format rather than reviewing the data for the accuracy. This process will delay the drug development process as such.
After SDTM:
There will be standard domain names and standard structure for each domain. There will be a list of standard variables and names for each and every dataset. Because of this, it will become easy to find and understand the data and reviewers will need less time to review the data than the data without SDTM standards. This process will improve the consistency in reviewing the data and it can be time efficient.
The purpose of creating SDTM domain data sets is to provide Case Report Tabulation (CRT) data FDA, in a standardized format. If we follow these standards it can greatly reduce the effort necessary for data mapping. Improper use of CDISC standards, such as using a valid domain or variable name incorrectly, can slow the metadata mapping process and should be avoided4.
2) PROC CDISC for SDTM 3.1 Format 2?
Syntax The PROC CDISC syntax for CDISC SDTM is presented below. The DATA= parameter specifies the location of your SDTM conforming data source.PROC CDISC MODEL=SDTM;SDTM SDTMVersion = "3.1";DOMAINDATA DATA = results. AE DOMAIN = AE CATEGORY = EVENT;RUN;
3) What are the capabilities of PROC CDISC 2?
PROC CDISC performs the following checks on domain content of the source:
Verifies that all required variables are present in the data set
Reports as an error any variables in the data set that are not defined in the domain
Reports a warning for any expected domain variables that are not in the data set
Notes any permitted domain variables that are not in the data set
Verifies that all domain variables are of the expected data type and proper length
Detects any domain variables that are assigned a controlled terminology specification by the domain and do not have a format assigned to them.
The procedure also performs the following checks on domain data content of the source on a per observation basis:
Verifies that all required variable fields do not contain missing values
Detects occurrences of expected variable fields that contain missing values
Detects the conformance of all ISO-8601 specification assigned values; including date, time, date time, duration, and interval types
Notes correctness of yes/no and yes/no/null responses,
4) What are the different approaches for creating the SDTM 3?
There are 3 general approaches to create the SDTM datasets:
a) Build the SDTM entirely in the CDMS,
b) Build the SDTM entirely on the “back-end” in SAS,
c) or take a hybrid approach and build the SDTM partially in the CDMS and partially in SAS.
BUILD THE SDTM ENTIRELY IN THE CDMS
It is possible to build the SDTM entirely within the CDMS. If the CDMS allows for broad structural control of the underlying database, then you could build your eCRF or CRF based clinical database to SDTM standards.
Advantages:
• Your “raw” database is equivalent to your SDTM which provides the most elegant solution.
• Your clinical data management staff will be able to converse with end-users/sponsors about the data easily since your clinical data manager and the und-user/sponsor will both be looking at SDTM datasets.
• As soon as the CDMS database is built, the SDTM datasets are available.
Disadvantages:
• This approach may be cost prohibitive. Forcing the CDMS to create the SDTM structures may simply be too cumbersome to do efficiently.
• Forcing the CDMS to adapt to the SDTM may cause problems with the operation of the CDMS which could reduce data quality.
BUILD THE SDTM ENTIRELY ON THE “BACK-END” IN SAS
Assuming that SAS is not your CDMS solution, another approach is to take the clinical data from your CDMS and manipulate it into the SDTM with SAS programming.
Advantages:
• The great flexibility of SAS will let you transform any proprietary CDMS structure into the SDTM. You do not have to work around the rigid constraints of the CDMS.
• Changes could be made to the SDTM conversion without disturbing clinical data management processes.
• The CDMS is allowed to do what it does best which is to enter, manage, and clean data.
Disadvantages: • There would be additional cost to transform the data from your typical CDMS structure into the SDTM.
Specifications, programming, and validation of the SAS programming transformation would be required.
• Once the CDMS database is up, there would then be a subsequent delay while the SDTM is created in SAS.
This delay would slow down the production of analysis datasets and reporting. This assumes that you follow the linear progression of CDMS -> SDTM -> analysis datasets (ADaM).
• Since the SDTM is a derivation of the “raw” data, there could be errors in translation from the “raw” CDMS data to the SDTM.
• Your clinical data management staff may be at a disadvantage when speaking with end-users/sponsors about the data since the data manager will likely be looking at the CDMS data and the sponsor will see SDTM data.
BUILD THE SDTM USING A HYBRID APPROACH
Again, assuming that SAS is not your CDMS solution, you could build some of the SDTM within the confines of the CDMS and do the rest of the work in SAS. There are things that could be done easily in the CDMS such as naming data tables the same as SDTM domains, using SDTM variable names in the CTMS, and performing simple derivations (such as age) in the CDMS. More complex SDTM derivations and manipulations can then be performed in SAS.
Advantages:
• The changes to the CDMS are easy to implement.
• The SDTM conversions to be done in SAS are manageable and much can be automated.
Disadvantages:
• There would still be some additional cost needed to transform the data from the SDTM-like CDMS structure into the SDTM. Specifications, programming, and validation of the transformation would be required.
• There would be some delay while the SDTM-like CDMS data is converted to the SDTM.
• Your clinical data management staff may still have a slight disadvantage when speaking with endusers/ sponsors about the data since the clinical data manager will be looking at the SDTM-like data and the sponsor will see the true SDTM data.
5) What do you know about SDTM domains?
A basic understanding of the SDTM domains, their structure and their interrelations is vital to determining which domains you need to create and in assessing the level to which your existing data is compliant. The SDTM consists of a set of clinical data file specifications and underlying guidelines. These different file structures are referred to as domains. Each domain is designed to contain a particular type of data associated with clinical trials, such as demographics, vital signs or adverse events.
The CDISC SDTM Implementation Guide provides specifications for 30 domains. The SDTM domains are divided into six classes.
The 21 clinical data domains are contained in three of these classes:
Interventions,
Events and
Findings.
The trial design class contains seven domains and the special-purpose class contains two domains (Demographics and Comments).
The trial design domains provide the reviewer with information on the criteria, structure and scheduled events of a clinical trail. The only required domain is demographics.
There are two other special purpose relationship data sets, the Supplemental Qualifiers (SUPPQUAL) data set and the Relate Records (RELREC) data set. SUPPQUAL is a highly normalized data set that allows you to store virtually any type of information related to one of the domain data sets. SUPPQUAL domain also accommodates variables longer than 200, the Ist 200 characters should be stored in the domain variable and the remaining should be stored in it5.
6) What are the general guidelines to SDTM variables?
Each of the SDTM domains has a collection of variables associated with it.
There are five roles that a variable can have:
Identifier,
Topic,
Timing,
Qualifier,
and for trial design domains,
Rule. Using lab data as an example, the subject ID, domain ID and sequence (e.g. visit) are identifiers.
The name of the lab parameter is the topic,
the date and time of sample collection are timing variables,
the result is a result qualifier and the variable containing the units is a variable qualifier.
Variables that are common across domains include the basic identifiers study ID (STUDYID), a two-character domain ID (DOMAIN) and unique subject ID (USUBJID).
In studies with multiple sites that are allowed to assign their own subject identifiers, the site ID and the subject ID must be combined to form USUBJID.
Prefixing a standard variable name fragment with the two-character domain ID generally forms all other variable names.
The SDTM specifications do not require all of the variables associated with a domain to be included in a submission. In regard to complying with the SDTM standards, the implementation guide specifies each variable as being included in one of three categories:
Required, Expected, and Permitted4.
REQUIRED – These variables are necessary for the proper functioning of standard software tools used by reviewers. They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set structure; however it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
7) Can you tell me more About SDTM Domains5?
SDTM Domains are grouped by classes, which is useful for producing more meaningful relational schemas. Consider the following domain classes and their respective domains.
• Special Purpose Class – Pertains to unique domains concerning detailed information about the subjects in a study.
Demography (DM), Comments (CO)
• Findings Class – Collected information resulting from a planned evaluation to address specific questions about the subject, such as whether a subject is suitable to participate or continue in a study.
Electrocardiogram (EG)
Inclusion / Exclusion (IE)
Lab Results (LB)
Physical Examination (PE)
Questionnaire (QS)
Subject Characteristics (SC)
Vital Signs (VS)
• Events Class – Incidents independent of the study that happen to the subject during the lifetime of the study.
Adverse Events (AE)
Patient Disposition (DS)
Medical History (MH)
• Interventions Class – Treatments and procedures that are intentionally administered to the subject, such as treatment coincident with the study period, per protocol, or self-administered (e.g., alcohol and tobacco use).
Concomitant Medications (CM)
Exposure to Treatment Drug (EX)
Substance Usage (SU)
• Trial Design Class – Information about the design of the clinical trial (e.g., crossover trial, treatment arms) including information about the subjects with respect to treatment and visits.
Subject Elements (SE)
Subject Visits (SV)
Trial Arms (TA)
Trial Elements (TE)
Trial Inclusion / Exclusion Criteria (TI)
Trial Visits (TV)
7) Can you tell me how to do the Mapping for existing Domains?
First step is the comparison of metadata with the SDTM domain metadata. If the data getting from the data management is in somewhat compliance to SDTM metadata, use automated mapping as the Ist step.
If the data management metadata is not in compliance with SDTM then avoid auto mapping. So do manual mapping the datasets to SDTM datasets and the mapping each variable to appropriate domain.
The whole process of mapping include: *Read in the corporate data standards into a database table.
• Assign a CDISC domain prefix to each database module.
• Attach a combo box containing the SDTM variable for the selected domain to a new mapping variable field.
• Search each module, and within each module select the most appropriate CDISC variable.
•Then search for variables mapped to the wrong type Character not equal to Character; Numeric not equal to Numeric.
• Review the mapping to see if any conflicts are resolvable by mapping to a more appropriate variable.
• We need to verify that the mapped variable is appropriate for each role.
• Then finally we have to ensure all ‘required’ variables are present in the domain6.
8) What do you know about SDTM Implementation Guide, Have you used it, if you have can you tell me which version you have used so far?
SDTM Implementation guide provides documentation on metadata (data of data) for the domain datasets that includes filename, variable names, type of variables and its labels etc. I have used SDTM implementation guide versions 3.1.1/3.1.2
9) Can you identify which variables should we have to include in each domain?
A) SDTM implementation guide V 3.1.1/V 3.1.2 specifies each variable is being included in one of the 3 types.
REQUIRED –They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set; however it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
10) Can you give some examples for MAPPING *6?
Here are some examples for SDTM mapping:
• Character variables defined as Numeric
• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Core SDTM is a subset of the existing corporate standards
• Vertical versus Horizontal structure, (e.g. Vitals)
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
10) Explain the Process of SDTM Mapping?
A list of basic variable mappings is given below *4.
DIRECT: a CDM variable is copied directly to a domain variable without any changes other than assigning the CDISC standard label.
RENAME: only the variable name and label may change but the contents remain the same.
STANDARDIZE: mapping reported values to standard units or standard terminology
REFORMAT: the actual value being represented does not change, only the format in which is stored changes, such as converting a SAS date to an ISO8601 format character string.
COMBINING: directly combining two or more CDM variables to form a single SDTM variable.
SPLITTING: a CDM variable is divided into two or more SDTM variables.
DERIVATION: creating a domain variable based on a computation, algorithm, series of logic rules or decoding using one or more CDM variables.
11) What are the Common Issues in Mapping Dummy corporate standards to CDISC (SDTM) Standards?
• Character variables defined as Numeric
• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Dictionary codes not in SDTM parent module, so if needed must be collected in SUPPQUAL.
• Core SDTM is a subset of the existing corporate standards
• Different structure of Lab CDISC Domain e.g. baseline flag.
• Vertical versus Horizontal structure, (e.g. Vitals)
• Additional Metadata needed to describe the source in SUPPQUAL
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
Ref: Mapping Corporate Data Standards to the CDISC Model (SAS Paper) by David Parker, AstraZeneca, Manchester, United Kingdom
The Analysis Data Model describes the general structure, metadata, and content typically found in Analysis Datasets and accompanying documentation. The three types of metadata associated with analysis datasets (analysis dataset metadata, analysis variable metadata, and analysis results metadata) are described and examples provided. (source:CDISC Analysis Data Model: Version 2.0)
Analysis datasets (AD) are typically developed from the collected clinical trial data and used to create statistical summaries of efficacy and safety data. These AD’s are characterized by the creation of derived analysis variables and/or records. These derived data may represent a statistical calculation of an important outcome measure, such as change from baseline, or may represent the last observation for a subject while under therapy. As such, these datasets are one of the types of data sent to the regulatory agency such as FDA.
The CDISC Analysis Data Model (ADaM) defines a standard for Analysis Dataset’s to be submitted to the regulatory agency. This provides a clear content, source, and quality of the datasets submitted in support of the statistical analysis performed by the sponsor.
In ADaM, the descriptions of the AD’s build on the nomenclature of the SDTM with the addition of attributes, variables and data structures needed for statistical analyses. To achieve the principle of clear and unambiguous communication relies on clear AD documentation. This documentation provides the link between the general description of the analysis found in the protocol or statistical analysis plan and the source data.
12) Can you explain AdaM or AdaM datasets *7?
References:
1) http://support.sas.com/rnd/base/xmlengine/proccdisc/cdiscsdtm.html
2) http://www.fda.gov
1) http://support.sas.com/rnd/base/xmlengine/proccdisc/cdiscsdtm.html
2) http://www.fda.gov
3) pharmasug/2005/fdacompliance/fc01.pdf
4) http://www2.sas.com/proceedings/forum2008/207-2008.pdf
5) http://analytics.ncsu.edu/sesug/2006/PO08_06.PDF
6) http://www.lexjansen.com/phuse/2005/cd/cd11.pdf
7) http://www.pharmasug.org/2005/FC03.pdf
Apart from those .. you may also need to prepare for these questions too...
Robert Stemplinger:
1) How many years experience you have working with CDISC standards?
2) What have you been done as per CDISC standards.
(Tell me the usuall process flow or the procedure you have followed regarding implementation of CDISC standards)
3) For how many studies so far you have done SDTM mapping.
4) Have you ever been asked to create specifications for SDTM mapping.
If yes, how do you create specification document for mapping.
5) Do you have experience doing the mapping as per the sponsor standards.
6) a) Tell me few details about the databases you have worked with so far?
b) Which database do you think you had most trouble with? (Inform, Rave, Clintrial or Oracle clinical)
7) How do you validate
a) annotated CRF
b) Specification Document
c) SDTM datasets
d) Case Report Tabulations (CRT-DDS)
8) a) How do you verify all the standards has been maintained as per the SDTM implementation guide?
b) How do you perform validation checks on SDTM v 3.1.1 or 3.1.2 datasets? ( WEBSDM/Open CDISC or PROC CDISC)?
9) What you will do when you find a problem as part of the validation process?
10) What kind of macros you have developed which can be useful in creating SDTM standard datasets?
11) Do you like to create a single program for each domain and then include in a batch program or
just one big program for all the domains.
12) Do you have any experience talking to the client on regular basis? If, yes... share with me your experience?
13) Do you have experience working with people in different time zone?
14) Do you have experience or knowledge about WEBSDM checks or Open CDISC?
15) Do you know PROC CDISC?
16) How do you create Define file (XML or PDF), if you already had experience creating one?
17) If you are working as a validator, how do you communicate with the main programmer?
18) How many weeks time you think you need to finish creating the SDTM datasets? (Just for programming)?
How many weeks, if you also been asked to develop specifications?
19) Is there any sample program you can write or show ... which will give us an idea about you SAS programming skills?
20) What's the challenging part regarding the whole SDTM mapping process?
21) For which domain do you think you always need to be very careful? and why?
22) If I ask you to create SDTM mapping specification document? what documents or files you need and why?
23) Do you know anything about splitting domains. (or Can you split the domains rather than creating one big domain)?
24) What is value level meta data?
25) What do you know about controlled terminology and for which domains you need controlled terminology?
26) What are RELREC and SUPPQUAL domains.
27) Can you share with me any differences you know between implementation guide v3.1.1 and v3.1.2?
28) How do you determine the time line, If the client asked you to provide one for the SDTM mapping conversion process?
29) Is there any way to apply attributes to the SDTM variables other than just manually typing all the details about (length/label/format/informat etc) in an attrib statement?
30) You have been asked to create a domain (not included in implmentation guide) for CRF, what you will do or how do you create one?
Here are few more questions .....exclusive to SDTM Mapping....
4) http://www2.sas.com/proceedings/forum2008/207-2008.pdf
5) http://analytics.ncsu.edu/sesug/2006/PO08_06.PDF
6) http://www.lexjansen.com/phuse/2005/cd/cd11.pdf
7) http://www.pharmasug.org/2005/FC03.pdf
Apart from those .. you may also need to prepare for these questions too...
Robert Stemplinger:
1) How many years experience you have working with CDISC standards?
2) What have you been done as per CDISC standards.
(Tell me the usuall process flow or the procedure you have followed regarding implementation of CDISC standards)
3) For how many studies so far you have done SDTM mapping.
4) Have you ever been asked to create specifications for SDTM mapping.
If yes, how do you create specification document for mapping.
5) Do you have experience doing the mapping as per the sponsor standards.
6) a) Tell me few details about the databases you have worked with so far?
b) Which database do you think you had most trouble with? (Inform, Rave, Clintrial or Oracle clinical)
7) How do you validate
a) annotated CRF
b) Specification Document
c) SDTM datasets
d) Case Report Tabulations (CRT-DDS)
8) a) How do you verify all the standards has been maintained as per the SDTM implementation guide?
b) How do you perform validation checks on SDTM v 3.1.1 or 3.1.2 datasets? ( WEBSDM/Open CDISC or PROC CDISC)?
9) What you will do when you find a problem as part of the validation process?
10) What kind of macros you have developed which can be useful in creating SDTM standard datasets?
11) Do you like to create a single program for each domain and then include in a batch program or
just one big program for all the domains.
12) Do you have any experience talking to the client on regular basis? If, yes... share with me your experience?
13) Do you have experience working with people in different time zone?
14) Do you have experience or knowledge about WEBSDM checks or Open CDISC?
15) Do you know PROC CDISC?
16) How do you create Define file (XML or PDF), if you already had experience creating one?
17) If you are working as a validator, how do you communicate with the main programmer?
18) How many weeks time you think you need to finish creating the SDTM datasets? (Just for programming)?
How many weeks, if you also been asked to develop specifications?
19) Is there any sample program you can write or show ... which will give us an idea about you SAS programming skills?
20) What's the challenging part regarding the whole SDTM mapping process?
21) For which domain do you think you always need to be very careful? and why?
22) If I ask you to create SDTM mapping specification document? what documents or files you need and why?
23) Do you know anything about splitting domains. (or Can you split the domains rather than creating one big domain)?
24) What is value level meta data?
25) What do you know about controlled terminology and for which domains you need controlled terminology?
26) What are RELREC and SUPPQUAL domains.
27) Can you share with me any differences you know between implementation guide v3.1.1 and v3.1.2?
28) How do you determine the time line, If the client asked you to provide one for the SDTM mapping conversion process?
29) Is there any way to apply attributes to the SDTM variables other than just manually typing all the details about (length/label/format/informat etc) in an attrib statement?
30) You have been asked to create a domain (not included in implmentation guide) for CRF, what you will do or how do you create one?
Here are few more questions .....exclusive to SDTM Mapping....
CDISC SDTM Questions You might be asked
in an interview
1)
Have you used - -STAT variable anytime. If yes,
why and in what kind of domain you used that variable.
2)
I see in your CV that you have experience in
developing SDTM domains based on IG 3.1.1, V3.1.2 and V3.1.3. Can you share
some of the differences between each version of Implementation Guide?
(Difference between SDTM IG 3.1.1 vs. V3.1.2 and V3.1.2 vs. V3.1.3)
3)
Can you
give me an example of a variable which can be used to group some of the records?
4)
Tell me your experience using - -SPEC variable.
5)
What’s the significance of - -PRESP variable and
tell me what do you know about - -OCCUR variable.
6)
Can you give me an example of a Topic Variable
in:
a)
Intervention Domains
b)
Event Domains
c)
Finding Domains
7)
What’s your experience creating the Related
Records domain (RELREC)? Can you give me few examples of the domains you’ve used
to create a RELREC SDTM domain?
8)
What’s your experience creating the Findings About
(FA) and Clinical Events (CE) domains.
What’s the difference between the FA and CE
domains?
9)
Can you give me few examples of the kind of data
you are going to map it to FA and CE domains.
10)
Why can’t we include Clinical Event data in AE
domain?
11)
What’s your experience creating the custom
domains? How do you create a custom domain?
12)
What you do, if you have a CRF page and all of
the information collected on it aren’t related to any specific SDTM domain.
13)
When do you create a SUPPQUAL or Custom domain?
14)
If you have any experience creating a custom
domain, can you share, what kind of the data that was and what’s the PREFIX you
have used for the domain name.
15)
Tell me about the difficult thing you have to do
or manage when you work as a SDTM standards implementer.
16)
Have you use - -OBJ variable. If you are, in
which domain? And what’s the significance.
17)
Tell me about Required/Expected or Permissible variables
in SDTM domains.
18)
Have you created any Tumor Domains? Can you give use few examples of the tumor
domains you have created.
Friday, September 12, 2008
What you should know about the ISS/ISE (ISR)
There are many reasons to integrate and to summarize all the data from a clinical trial program. Each clinical trial in the program is unique in its objective and design. Some are small safety studies among normal volunteers, while others are efficacy trials in a large patient population.
The primary reason to create an integrated summary is to compare and to contrast all the various study results and to arrive at one consolidated review of the benefit/risk profile.
A second and important reason is to reach a defensible statistical conclusion, through an exploration of the integrated data, that no competing alternative hypothesis that can reasonably account for the observed findings exists.
Third, pooling the data from various studies enables the examination of trends in rare subgroups of patients, such as the elderly, those with differing disease states (mild vs. severe), and those with comorbidities at baseline. Last, providing such a summary in the new drug application is required by the Food and Drug Administration (FDA) and other international authorities.
ISS will have all the clinical trial data, collected form a normal volunteers (from phase 1 study) and patients (all other studies).
ISE will have the clinical trial data only from the phase II and Phase III and phase IV and not of Phase I. The reason behind this is, Phase I study is conducted to identify the safety and not the efficacy of the drug, so the data from the Phase I study will not be there in the ISE.
ISE should have a description of entire efficacy database demographics and baseline characteristics.
ISS should have the details including the extent of the exposure of drug by the patient, different characteristics of patients enrolled in the study, listing the deaths occurred -during the study, How many patients are drop-outs from the study and Potential SAE, other AE and lab results.
ISS is considered as one of the most necessary document required for filing the NDA (new drug application). The safety data from different trials can be integrated by pooling all the safety data together and then to identify the AE, that are rare. The data integration approach for the ISS and ISE are entirely different, whereas pooling the efficacy data from different studies is not required, although pooled data will give more information regarding the efficacy of the drug. Pooling all the safety data is necessary in making the ISS. ISS needs a thorough research because it involves with the safety and safety parameter is considered important than the efficacy in a clinical trial, because study should always benefit patients.
ISR (integrated summary report):
It is a compilation of all the information collected from the safety and efficacy analysis in all the studies. ISS and ISE are different parts of ISR. Both the ISS and ISE reports are necessary for all the new drug applications (NDA) in the United States.
Every clinical trial is different, because each one is conducted for a specific purpose (Phase I for safety in normal population and all other for efficacy in patients).
The reason behind creating the ISR will be to create an integrated report to compare and to differentiate all other study results and to get one conclusion after reviewing the patient benefit/risk profile. It requires by the FDA is the other reason. Last but not the least reason for this is to reach a definite conclusion through thorough checking all the data which is integrated.
source: Encylcopedia of biopharmaceutical statistics page no: 486-489
INTEGRATED SUMMARIES OF SAFETY AND EFFICACY ( ISS AND ISE):
INTEGRATED SUMMARY of SAFETY REPORT
CTD – ISS/ISE: Introduction and Summary of Issues
Robert J. Temple, M.D.
Associate Director for Medical Policy
Center for Drug Evaluation and Research
U.S. Food and Drug Administration
The primary reason to create an integrated summary is to compare and to contrast all the various study results and to arrive at one consolidated review of the benefit/risk profile.
A second and important reason is to reach a defensible statistical conclusion, through an exploration of the integrated data, that no competing alternative hypothesis that can reasonably account for the observed findings exists.
Third, pooling the data from various studies enables the examination of trends in rare subgroups of patients, such as the elderly, those with differing disease states (mild vs. severe), and those with comorbidities at baseline. Last, providing such a summary in the new drug application is required by the Food and Drug Administration (FDA) and other international authorities.
ISS will have all the clinical trial data, collected form a normal volunteers (from phase 1 study) and patients (all other studies).
ISE will have the clinical trial data only from the phase II and Phase III and phase IV and not of Phase I. The reason behind this is, Phase I study is conducted to identify the safety and not the efficacy of the drug, so the data from the Phase I study will not be there in the ISE.
ISE should have a description of entire efficacy database demographics and baseline characteristics.
ISS should have the details including the extent of the exposure of drug by the patient, different characteristics of patients enrolled in the study, listing the deaths occurred -during the study, How many patients are drop-outs from the study and Potential SAE, other AE and lab results.
ISS is considered as one of the most necessary document required for filing the NDA (new drug application). The safety data from different trials can be integrated by pooling all the safety data together and then to identify the AE, that are rare. The data integration approach for the ISS and ISE are entirely different, whereas pooling the efficacy data from different studies is not required, although pooled data will give more information regarding the efficacy of the drug. Pooling all the safety data is necessary in making the ISS. ISS needs a thorough research because it involves with the safety and safety parameter is considered important than the efficacy in a clinical trial, because study should always benefit patients.
ISR (integrated summary report):
It is a compilation of all the information collected from the safety and efficacy analysis in all the studies. ISS and ISE are different parts of ISR. Both the ISS and ISE reports are necessary for all the new drug applications (NDA) in the United States.
Every clinical trial is different, because each one is conducted for a specific purpose (Phase I for safety in normal population and all other for efficacy in patients).
The reason behind creating the ISR will be to create an integrated report to compare and to differentiate all other study results and to get one conclusion after reviewing the patient benefit/risk profile. It requires by the FDA is the other reason. Last but not the least reason for this is to reach a definite conclusion through thorough checking all the data which is integrated.
source: Encylcopedia of biopharmaceutical statistics page no: 486-489
INTEGRATED SUMMARIES OF SAFETY AND EFFICACY ( ISS AND ISE):
INTEGRATED SUMMARY of SAFETY REPORT
CTD – ISS/ISE: Introduction and Summary of Issues
Robert J. Temple, M.D.
Associate Director for Medical Policy
Center for Drug Evaluation and Research
U.S. Food and Drug Administration
Tips for Producing Comprehensive Integrated Summaries
Integrated Summaries of Effectiveness and Safety: Location Within the Common Technical Document
Thursday, September 4, 2008
SAS in Clinical trials:
Clinical trials:
Clinical Trails
Clinical Trials Terminology for SAS Programmers
A Simple Solution for Managing the Validation of SAS Programs
Electronic Clinical Data Capture
Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs
SAS Programming in the Pharmaceutical Industry
SASâ Programming Career Choices In The Health Care Industry
Some Statistical Programming Considerations for e-Submission
The Changing Nature of SAS Programming in the Pharmaceuticals Industry
Managing Clinical Trials Data using SAS® Software
Quality Control and Quality Assurance in Clinical Research: SAS
CDISC:
An Introduction to CDISC:
CDISC: Why SAS® Programmers Need to Know
CDISC Implementation Step by Step: A Real World Example
CDISC standards
Supporting the CDISC standards
How to test CDISC Operation data Model (ODM) in SAS
The Use of CDISC Standards in SAS from Data Capture to Reporting
Clinical Data Model and FDA/CDISC Submissions
Creating Case Report Tabulations (CRTs) for an NDA Electronic Submission to the FDA
SDTM-annotated CRFs
Data Integrity through DEFINE.PDF and DEFINE.XML
SAS® and the CDISC (Clinical Data Interchange Standards Consortium)
Implementing an Audit Trail within a Clinical Reporting Tool
The CDISC ODM Study Designer :User Manual
XML Basics for SAS Programmers
Annotation of CRFs:
Trial eCRF Pages
Using SAS to Speed up Annotating Case Report Forms in PDF Format
ANNOTATED CASE REPORT FORM AUTOMATION SYSTEM
Annotated CRF 1: Download(CTN0008_SDTM_annotation_20070413.pdf - 2179Kb) Annotated CRF 2: Download(CTN001_SDTM_ANNOTATION_20070330.pdf - 564Kb) Annotated CRF 3: Download(CTN002_SDTM_ANNOTATION_20070403.pdf - 560Kb)
Study Protocol 1: Download
(NIDA-CTN-0001_Bup_Nx_vs_Clonidine_Inpatient_Protocol_v.5b_112700.pdf - 192Kb)
Clinical Trails
Clinical Trials Terminology for SAS Programmers
A Simple Solution for Managing the Validation of SAS Programs
Electronic Clinical Data Capture
Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs
SAS Programming in the Pharmaceutical Industry
SASâ Programming Career Choices In The Health Care Industry
Some Statistical Programming Considerations for e-Submission
The Changing Nature of SAS Programming in the Pharmaceuticals Industry
Managing Clinical Trials Data using SAS® Software
Quality Control and Quality Assurance in Clinical Research: SAS
CDISC:
An Introduction to CDISC:
CDISC: Why SAS® Programmers Need to Know
CDISC Implementation Step by Step: A Real World Example
CDISC standards
Supporting the CDISC standards
How to test CDISC Operation data Model (ODM) in SAS
The Use of CDISC Standards in SAS from Data Capture to Reporting
Clinical Data Model and FDA/CDISC Submissions
Creating Case Report Tabulations (CRTs) for an NDA Electronic Submission to the FDA
SDTM-annotated CRFs
Data Integrity through DEFINE.PDF and DEFINE.XML
SAS® and the CDISC (Clinical Data Interchange Standards Consortium)
Implementing an Audit Trail within a Clinical Reporting Tool
The CDISC ODM Study Designer :User Manual
XML Basics for SAS Programmers
Annotation of CRFs:
Trial eCRF Pages
Using SAS to Speed up Annotating Case Report Forms in PDF Format
ANNOTATED CASE REPORT FORM AUTOMATION SYSTEM
Annotated CRF 1: Download(CTN0008_SDTM_annotation_20070413.pdf - 2179Kb) Annotated CRF 2: Download(CTN001_SDTM_ANNOTATION_20070330.pdf - 564Kb) Annotated CRF 3: Download(CTN002_SDTM_ANNOTATION_20070403.pdf - 560Kb)
Study Protocol 1: Download
(NIDA-CTN-0001_Bup_Nx_vs_Clonidine_Inpatient_Protocol_v.5b_112700.pdf - 192Kb)
Online Study materials:
Fundamentals of Using SAS (part I)
Introduction to SAS
Descriptive information and statistics
An overview of statistical tests in SAS
Exploring data with graphics
Fundamentals of Using SAS (part II)
Using where with SAS procedures
Missing values in SAS
Common SAS options
Overview of SAS syntax of SAS procedures
Common error messages in SAS
Reading Raw Data into SAS
Inputting raw data into SAS
Reading dates into SAS and using date variables
Basic Data Management in SAS
Creating and recoding variables
Using SAS functions for making/recoding variables
Subsetting variables and observations
Labeling data, variables, and values
Using PROC SORT and the BY statement
Making and using permanent SAS data files (version 8)
Data Management:
How do I make unique anonymous ID variables for my data?
How can I create an enumeration variable by groups?
How can I see the number of missing values and patterns of missing values in my data file?
How can I count the number of missing values for a character variable?
How can I increment dates in SAS?How can I find things in a character variable in SAS?
How do I standardize variables (make them have a mean of 0 and sd of 1)?
Is there a quick way to create dummy variables?
Reading/Writing Data Files
How do I read a file that uses commas, tabs or spaces as delimiters to separate variables?
How do I read a delimited file with missing values?
How do I read a delimited file that has delimiters embedded in the data?
What are some common infile options for reading a raw data file?
How do I read raw data files compressed with gzip (.gz files) in SAS?
How do I write a data file that uses commas, tabs or spaces as delimiters between variables?How do I read/write Excel files in SAS version 8?
Reading/Writing SAS Files with Formats
How do I use a SAS data file with a format library?
How do I use a SAS data file when I don't have its format library?
Other:
How can I change the way variables are displayed in proc freq?
How can I put a value from a data file to a macro variable?
How can I create tables using proc tabulate?
My SAS Manuals: 1. Basic and 2. Applications (Preliminary Version) ZIP file (about 400meg)
source:www.estat.com
Procedures
PROC MEANS More than just your average procedure(PDF) by Peter R. Welbrock
The power of PROC FORMAT(PDF) by Jonas V. Bilenas
Ten Things You Should Know About PROC FORMAT(PDF) by Jack Shoemaker
PROC SQL for DATA Step Die-Hards(PDF) by Christianna S. Williams
An Introduction to the SQL Procedure(PDF) by Chris Yindra
Alternatives to Merging SAS Data Sets … But Be Careful(PDF) by Michael J. Wieczkowski
Handling Missing Values in the SQL Procedure(PDF) by Danbo Yi & MA Lei Zhang
Creating and using indexes in SASCreating and using formats and format libraries in SAS
Using multidimensional arraysGood Programming Practices
Bulletproofing Your SAS Results(PDF) by Vanessa Hayden
Clean-up, Comments and Code - Making it Maintainable(PDF) by Clay and Lori MartinSAS Program Efficiency for Beginners(PDF) by Bruce Gilsen
Coding for Posterity(PDF) by Rick AsterOutput Delivery System(ODS)
ODS, YES! Odious, NO! – An Introduction to the SAS Output Delivery System(PDF) by Lara Bryant, Sally Muller & Ray Pass
ODS for Data Analysis: Output As-You-Like-It in Version 7(PDF) by Christopher R. Olinger and Randall D. Tobias, from SUGI Proceedings, 1998, courtesy of SAS.
Making the SAS Output Delivery System (ODS) work for you(PDF) by William Fehlner, from SUGI Proceedings, 1999, courtesy of SAS.
Twisty Little Passages All Alike, Output Delivery System (ODS) Templates Exposed(PDF) by Chris Olinger, from SUGI Proceedings, 1999, courtesy of SAS.
Converting Multiple SAS Output Files to Rich Text Format Automatically without Using ODS
SAS Macros
Getting Started with Macros(PDF) by Ian Whitlock
Moving from Macro Variables to Macros(PDF) by Lisa Sanbonmatsu
Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility(PDF) by Michael G. Sadof
An Introduction to Macro Variables and Macro Programs(PDF) by Mike S. Zdeb
Creating Macro Variables via PROC SQL(PDF) by Mike S. Zdeb
More About “INTO:Host-Variable” in PROC SQL: Examples(PDF) by John Q. Zhang
Macro Quoting Functions, Other Special Character Masking Tools, and How To Use Them(PDF) by Arthur L. Carpenter
Secrets of Macro Quoting Functions – How and Why(PDF) by Susan O’Connor
&&&, ;;, and Other Hieroglyphics Advanced Macro Topics(PDF) by Chris Yindra, C. YDeveloping, Managing, and Evaluating a Standard Macro System by Albert MoPROC SQL:
An Introduction to Proc SqlTop Ten Reasons to Use PROC SQL
ENTERPRISE GUIDE:
SAS Enterprise Guide for SAS Programmers
Using SAS® Enterprise Guide® to Code When You’re Not aProgrammer
The New World of SAS®: Programming with SAS® EnterpriseGuide®
SAS Enterprise Guide:Data Manipulation, Reports,& Statistical Procedures
Introduction to Using SAS® Enterprise Guide® for Statistical Analysis
SAS Graph:
Improving Your Graphics Using SAS/GRAPH® Annotate Facility
A Powerful Macro to Control Title Appearance in SAS/GRAPH® OutputSAS/GRAPH® 101
Using ODS Styles with SAS/GRAPH®
ODS Statistical Graphics for Clinical Research
Know Your AREA!Creating Professional SAS® Graphics in Clinical Safety Data byUsing the AREAS Option in PROC GPLOT.
Other
Debugging 101(PDF) by Peter Knapp
Those Missing Values in Questionnaires(PDF) by John R. Gerlach & Cindy Garra
Avoiding Mayhem in the New Millennium: Working with Missing Data(PDF) by JoAnn Matthews
Simplifying Complex Character Comparisons by Using the IN Operator and the Colon (:) Operator Modifier(PDF) by Paul Grant
Arrays: In and Out and All About(PDF) by Marge Scerbo
Complex Arrays Made Simple(PDF) by Mary McDonald, PaineWebber Incorporated
You Could Look It Up: An Introduction to SASHELP Dictionary Views(PDF) by Michael Davis, The 'SKIP' Statement(PDF) by Paul Grant
Indexing and Compressing SAS Data Sets: How, Why, and Why Not(PDF) Andrew H. Karp,
Automating the Creation of a Single Bookmarked PDF Documentfrom Multiple SAS® ASCII and PostScript® Output Files Be Careful When You Merge SAS Datasets!
courtesy of NESUG
Introduction to SAS
Descriptive information and statistics
An overview of statistical tests in SAS
Exploring data with graphics
Fundamentals of Using SAS (part II)
Using where with SAS procedures
Missing values in SAS
Common SAS options
Overview of SAS syntax of SAS procedures
Common error messages in SAS
Reading Raw Data into SAS
Inputting raw data into SAS
Reading dates into SAS and using date variables
Basic Data Management in SAS
Creating and recoding variables
Using SAS functions for making/recoding variables
Subsetting variables and observations
Labeling data, variables, and values
Using PROC SORT and the BY statement
Making and using permanent SAS data files (version 8)
Data Management:
How do I make unique anonymous ID variables for my data?
How can I create an enumeration variable by groups?
How can I see the number of missing values and patterns of missing values in my data file?
How can I count the number of missing values for a character variable?
How can I increment dates in SAS?How can I find things in a character variable in SAS?
How do I standardize variables (make them have a mean of 0 and sd of 1)?
Is there a quick way to create dummy variables?
Reading/Writing Data Files
How do I read a file that uses commas, tabs or spaces as delimiters to separate variables?
How do I read a delimited file with missing values?
How do I read a delimited file that has delimiters embedded in the data?
What are some common infile options for reading a raw data file?
How do I read raw data files compressed with gzip (.gz files) in SAS?
How do I write a data file that uses commas, tabs or spaces as delimiters between variables?How do I read/write Excel files in SAS version 8?
Reading/Writing SAS Files with Formats
How do I use a SAS data file with a format library?
How do I use a SAS data file when I don't have its format library?
Other:
How can I change the way variables are displayed in proc freq?
How can I put a value from a data file to a macro variable?
How can I create tables using proc tabulate?
My SAS Manuals: 1. Basic and 2. Applications (Preliminary Version) ZIP file (about 400meg)
source:www.estat.com
Procedures
PROC MEANS More than just your average procedure(PDF) by Peter R. Welbrock
The power of PROC FORMAT(PDF) by Jonas V. Bilenas
Ten Things You Should Know About PROC FORMAT(PDF) by Jack Shoemaker
PROC SQL for DATA Step Die-Hards(PDF) by Christianna S. Williams
An Introduction to the SQL Procedure(PDF) by Chris Yindra
Alternatives to Merging SAS Data Sets … But Be Careful(PDF) by Michael J. Wieczkowski
Handling Missing Values in the SQL Procedure(PDF) by Danbo Yi & MA Lei Zhang
Creating and using indexes in SASCreating and using formats and format libraries in SAS
Using multidimensional arraysGood Programming Practices
Bulletproofing Your SAS Results(PDF) by Vanessa Hayden
Clean-up, Comments and Code - Making it Maintainable(PDF) by Clay and Lori MartinSAS Program Efficiency for Beginners(PDF) by Bruce Gilsen
Coding for Posterity(PDF) by Rick AsterOutput Delivery System(ODS)
ODS, YES! Odious, NO! – An Introduction to the SAS Output Delivery System(PDF) by Lara Bryant, Sally Muller & Ray Pass
ODS for Data Analysis: Output As-You-Like-It in Version 7(PDF) by Christopher R. Olinger and Randall D. Tobias, from SUGI Proceedings, 1998, courtesy of SAS.
Making the SAS Output Delivery System (ODS) work for you(PDF) by William Fehlner, from SUGI Proceedings, 1999, courtesy of SAS.
Twisty Little Passages All Alike, Output Delivery System (ODS) Templates Exposed(PDF) by Chris Olinger, from SUGI Proceedings, 1999, courtesy of SAS.
Converting Multiple SAS Output Files to Rich Text Format Automatically without Using ODS
SAS Macros
Getting Started with Macros(PDF) by Ian Whitlock
Moving from Macro Variables to Macros(PDF) by Lisa Sanbonmatsu
Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility(PDF) by Michael G. Sadof
An Introduction to Macro Variables and Macro Programs(PDF) by Mike S. Zdeb
Creating Macro Variables via PROC SQL(PDF) by Mike S. Zdeb
More About “INTO:Host-Variable” in PROC SQL: Examples(PDF) by John Q. Zhang
Macro Quoting Functions, Other Special Character Masking Tools, and How To Use Them(PDF) by Arthur L. Carpenter
Secrets of Macro Quoting Functions – How and Why(PDF) by Susan O’Connor
&&&, ;;, and Other Hieroglyphics Advanced Macro Topics(PDF) by Chris Yindra, C. YDeveloping, Managing, and Evaluating a Standard Macro System by Albert MoPROC SQL:
An Introduction to Proc SqlTop Ten Reasons to Use PROC SQL
ENTERPRISE GUIDE:
SAS Enterprise Guide for SAS Programmers
Using SAS® Enterprise Guide® to Code When You’re Not aProgrammer
The New World of SAS®: Programming with SAS® EnterpriseGuide®
SAS Enterprise Guide:Data Manipulation, Reports,& Statistical Procedures
Introduction to Using SAS® Enterprise Guide® for Statistical Analysis
SAS Graph:
Improving Your Graphics Using SAS/GRAPH® Annotate Facility
A Powerful Macro to Control Title Appearance in SAS/GRAPH® OutputSAS/GRAPH® 101
Using ODS Styles with SAS/GRAPH®
ODS Statistical Graphics for Clinical Research
Know Your AREA!Creating Professional SAS® Graphics in Clinical Safety Data byUsing the AREAS Option in PROC GPLOT.
Other
Debugging 101(PDF) by Peter Knapp
Those Missing Values in Questionnaires(PDF) by John R. Gerlach & Cindy Garra
Avoiding Mayhem in the New Millennium: Working with Missing Data(PDF) by JoAnn Matthews
Simplifying Complex Character Comparisons by Using the IN Operator and the Colon (:) Operator Modifier(PDF) by Paul Grant
Arrays: In and Out and All About(PDF) by Marge Scerbo
Complex Arrays Made Simple(PDF) by Mary McDonald, PaineWebber Incorporated
You Could Look It Up: An Introduction to SASHELP Dictionary Views(PDF) by Michael Davis, The 'SKIP' Statement(PDF) by Paul Grant
Indexing and Compressing SAS Data Sets: How, Why, and Why Not(PDF) Andrew H. Karp,
Automating the Creation of a Single Bookmarked PDF Documentfrom Multiple SAS® ASCII and PostScript® Output Files Be Careful When You Merge SAS Datasets!
courtesy of NESUG
SAS free study tutorials
Data step:
getting started 1: windows SAS code
getting started 2: data step SAS codeautomatic _N_ variable SAS code
drop & delete SAS codeformating: dates and numbers SAS code date sal.txt
(also see the format procedure below to create your own formats)
functions SAS codeimport: Bringing in data from Excel SAS code
Excel import file Excel export file text fileinput:
length statement SAS code infile options.txtlong SAS code long.txt
missing data SAS code
output option SAS code
pointers SAS code ex7.txt ex8.txt ex9.txtmore about pointers
SAS code pointers.SAS ex10.txt
missover & delimiter SAS code delimiter.txtmore on the delimiter SAS code
retain SAS codeset SAS code
simulations:
random numbers SAS code
sum SAS code
statistical functions SAS code
Logic:
do loops SAS codemore about do loops SAS codenested do loops SAS codeif then statements SAS code score.txt
Combining Data sets:
concatenating and interleaving SAS code
one-to-one merging SAS code
match merging SAS codeupdating SAS code
Character functions:
substring function SAS code
trim and left functions SAS code
compress and index functions SAS code record.txt
indexc and indexw functions SAS code
implicit character-to-numeric conversion SAS code
explicit character-to-numeric conversion SAS code
implicit and explicit numeric-to-character conversion SAS code
Arrays:
introduction to arrays SAS code
using arrays to count SAS code
using arrays to order observations SAS code
using arrays to transpose data SAS code ratsdose.txt
two dimensional arrays SAS code temp.txt fin.txt
Permanent SAS Data sets:
(great for large data sets)introduction: using libname SAS code
put and file statements SAS code survey.dat data1.dat data2.dat data3.txt fruit.dat data4.dat data5.dat income.dat
Procedures:
ANOVA SAS code incommed.datanalysis of equal vars: B-P for anova SAS code
contents: Great for large data sets SAS code
sheep.dat
correlation SAS code
import: Bringing in data from Excel SAS code Excel import file Excel export file text fileformat SAS code incommed.dat (also see formating above for SAS' date and number formats)
frequency tables SAS code incommed.dat freq.xlsmeans SAS code
incommed.dat
more about means SAS code incommed.datgcharts: Bar and Pie charts SAS code incommed.datgplot: a prettier plot SAS code
more about gplot SAS codeplot SAS codeprint SAS code account.txt
sort SAS code account.txt
more about sorting SAS code compt.txt t-test SAS code incommed.dat
transpose SAS code
univariate SAS code test.dat
Programming outside the Data step or Procedures:
Getting started 3: options SAS codemore options SAS code
Macros:
introduction:
macro variables (%let statement)
SAS code number.dat contest.dat%put statement SAS code score.dat
basic macros SAS code
macros with parameters SAS code ranks.dat
macro do loops SAS code
macro if/then/else statements SAS code makeup.dat
nested macros SAS codesimulations example SAS code reg.dat
Index to Statistics Tutorials (source:www.stattutorials.com)
PROC MEANS Tutorial (Descriptive statistics)
PROC UNIVARIATE Tutorial (Distribution analysis)
New: PROC UNIVARIATE - Advanced Tutorial
PROC CORR Tutorial (Correlation)
PROC FREQ Tutorial 1 (Frequency Tables/Goodness of Fit)
PROC FREQ Tutorial 2 (Two-way tables)
PROC TTEST Tutorial (Two sample and paired t-tests)
New: A comparison of Paired & Independent Sample t-tests
PROC ANOVA & GLM Tutorial (One-Way ANOVA)
PROC GLM Tutorial (Repeated measures ANOVA using PROC GLM)
New: Survival Analysis & comparison of groups using PROC LIFEREG
Bland-Altman Analysis (Comparing two measures)
Inter-Rater Reliability, Kappa, Weighted Kappa (PROC FREQ)
New: SAS Functions (2-part tutorial)
Special SAS Topics
New: Setting the SAS Initial Folder (default directory)
Using SAS ODS Output, Styles, Graphics, Data
Data files and SAS code for tutorials
General Statistical Tutorials
Interpreting p-values
Understanding hypothesis testing
Statistical comparison of two groups
getting started 1: windows SAS code
getting started 2: data step SAS codeautomatic _N_ variable SAS code
drop & delete SAS codeformating: dates and numbers SAS code date sal.txt
(also see the format procedure below to create your own formats)
functions SAS codeimport: Bringing in data from Excel SAS code
Excel import file Excel export file text fileinput:
length statement SAS code infile options.txtlong SAS code long.txt
missing data SAS code
output option SAS code
pointers SAS code ex7.txt ex8.txt ex9.txtmore about pointers
SAS code pointers.SAS ex10.txt
missover & delimiter SAS code delimiter.txtmore on the delimiter SAS code
retain SAS codeset SAS code
simulations:
random numbers SAS code
sum SAS code
statistical functions SAS code
Logic:
do loops SAS codemore about do loops SAS codenested do loops SAS codeif then statements SAS code score.txt
Combining Data sets:
concatenating and interleaving SAS code
one-to-one merging SAS code
match merging SAS codeupdating SAS code
Character functions:
substring function SAS code
trim and left functions SAS code
compress and index functions SAS code record.txt
indexc and indexw functions SAS code
implicit character-to-numeric conversion SAS code
explicit character-to-numeric conversion SAS code
implicit and explicit numeric-to-character conversion SAS code
Arrays:
introduction to arrays SAS code
using arrays to count SAS code
using arrays to order observations SAS code
using arrays to transpose data SAS code ratsdose.txt
two dimensional arrays SAS code temp.txt fin.txt
Permanent SAS Data sets:
(great for large data sets)introduction: using libname SAS code
put and file statements SAS code survey.dat data1.dat data2.dat data3.txt fruit.dat data4.dat data5.dat income.dat
Procedures:
ANOVA SAS code incommed.datanalysis of equal vars: B-P for anova SAS code
contents: Great for large data sets SAS code
sheep.dat
correlation SAS code
import: Bringing in data from Excel SAS code Excel import file Excel export file text fileformat SAS code incommed.dat (also see formating above for SAS' date and number formats)
frequency tables SAS code incommed.dat freq.xlsmeans SAS code
incommed.dat
more about means SAS code incommed.datgcharts: Bar and Pie charts SAS code incommed.datgplot: a prettier plot SAS code
more about gplot SAS codeplot SAS codeprint SAS code account.txt
sort SAS code account.txt
more about sorting SAS code compt.txt t-test SAS code incommed.dat
transpose SAS code
univariate SAS code test.dat
Programming outside the Data step or Procedures:
Getting started 3: options SAS codemore options SAS code
Macros:
introduction:
macro variables (%let statement)
SAS code number.dat contest.dat%put statement SAS code score.dat
basic macros SAS code
macros with parameters SAS code ranks.dat
macro do loops SAS code
macro if/then/else statements SAS code makeup.dat
nested macros SAS codesimulations example SAS code reg.dat
Index to Statistics Tutorials (source:www.stattutorials.com)
PROC MEANS Tutorial (Descriptive statistics)
PROC UNIVARIATE Tutorial (Distribution analysis)
New: PROC UNIVARIATE - Advanced Tutorial
PROC CORR Tutorial (Correlation)
PROC FREQ Tutorial 1 (Frequency Tables/Goodness of Fit)
PROC FREQ Tutorial 2 (Two-way tables)
PROC TTEST Tutorial (Two sample and paired t-tests)
New: A comparison of Paired & Independent Sample t-tests
PROC ANOVA & GLM Tutorial (One-Way ANOVA)
PROC GLM Tutorial (Repeated measures ANOVA using PROC GLM)
New: Survival Analysis & comparison of groups using PROC LIFEREG
Bland-Altman Analysis (Comparing two measures)
Inter-Rater Reliability, Kappa, Weighted Kappa (PROC FREQ)
New: SAS Functions (2-part tutorial)
Special SAS Topics
New: Setting the SAS Initial Folder (default directory)
Using SAS ODS Output, Styles, Graphics, Data
Data files and SAS code for tutorials
General Statistical Tutorials
Interpreting p-values
Understanding hypothesis testing
Statistical comparison of two groups
Subscribe to:
Posts (Atom)