Welcome to StudySAS, your ultimate guide to clinical data management using SAS. We cover essential topics like SDTM, CDISC standards, and Define.XML, alongside advanced PROC SQL and SAS Macros techniques. Whether you're enhancing your programming efficiency or ensuring compliance with industry standards, StudySAS offers practical tips and insights to elevate your clinical research expertise. Join us and stay ahead in the evolving world of clinical data.
Discover More Tips and Techniques on This Blog
Define.xml and cSDRG QC Checklist for FDA and PMDA Submissions
Comprehensive QC Checklist for Define.xml and cSDRG
Ensuring Quality and Compliance for FDA and PMDA SDTM Submissions
Introduction
The **Define.xml** and **Clinical Study Data Reviewer’s Guide (cSDRG)** are critical components of SDTM submissions to regulatory agencies like the FDA and PMDA. These documents help reviewers understand the structure, content, and traceability of the datasets submitted.
A robust QC process ensures compliance with agency requirements, minimizes errors, and enhances submission success. This blog outlines a detailed manual QC checklist for both Define.xml and cSDRG, emphasizing key differences between FDA and PMDA requirements.
Define.xml QC Checklist
1. Metadata Verification
Verify all datasets listed in Define.xml are included in the submission package.
Check that all variable metadata (e.g., variable names, labels, types, and lengths) matches the SDTM datasets.
Ensure consistency between controlled terminology values and the CDISC Controlled Terminology files.
Confirm all mandatory fields (e.g., Origin, Value Level Metadata, Comments) are correctly populated.
2. Controlled Terminology
Ensure variables like AEDECOD, LBTESTCD, and CMTRT align with the latest CDISC Controlled Terminology.
Check NCI Codelist codes for correctness and proper linkage to variables.
Verify that SUPPQUAL domains reference appropriate `QNAM` and `QVAL` values.
3. Links and Traceability
Ensure all hyperlinks in Define.xml (e.g., links to codelists, Value Level Metadata, and external documents) are functional.
Verify traceability for derived variables to source data or algorithms.
4. Value Level Metadata
Check that Value Level Metadata is used for variables with differing attributes (e.g., QVAL in SUPPQUAL).
Validate metadata application to specific values, ensuring alignment with dataset content.
5. Technical Validation
Run Define.xml through Pinnacle 21 or a similar validation tool to identify errors or warnings.
Validate XML structure against the CDISC Define-XML schema (e.g., UTF-8 encoding).
6. Documentation
Ensure accurate descriptions in the Comments section for clarity and traceability.
Check consistency between Define.xml and cSDRG descriptions.
cSDRG QC Checklist
1. Content Consistency
Ensure alignment with Define.xml in terms of datasets, variables, and controlled terminology.
Verify consistency with CDISC guidelines for SDRG structure and content.
2. Document Structure
Ensure all required sections are present:
Study Design Overview
Dataset-Specific Considerations
Traceability and Data Processing
Controlled Terminology
Verify the inclusion of Acronyms and Abbreviations.
3. Dataset-Level Review
Check that all datasets referenced in cSDRG are included in the Define.xml and the submission package.
Ensure documentation of traceability from raw data to SDTM datasets.
Validate derivation rules for key variables.
5. Controlled Terminology
Ensure controlled terminology usage aligns with Define.xml.
Document any deviations or extensions to standard controlled terminology.
6. Reviewer-Focused Content
Provide explanations for unusual scenarios (e.g., partial/missing dates, adverse event relationships).
Tailor descriptions to a reviewer’s perspective for clarity and usability.
7. Formatting and Usability
Ensure consistent fonts, headings, and numbering throughout the document.
Verify hyperlinks and table of contents functionality in the PDF format.
FDA vs. PMDA Considerations
While FDA and PMDA share many requirements, there are some critical differences:
Aspect
FDA
PMDA
Encoding
UTF-8
UTF-8 (focus on Japanese character encoding)
Validation Tools
Pinnacle 21 Community/Enterprise
Pinnacle 21 with PMDA-specific rules
Trial Summary (TS)
Focus on mandatory fields
Greater emphasis on PMDA-specific fields
Language
English
English and Japanese
Conclusion
Ensuring high-quality Define.xml and cSDRG documents is crucial for successful regulatory submissions to FDA and PMDA. Adhering to the detailed QC checklists outlined above will help identify and address issues early, saving time and reducing the risk of rejection. Tailoring your approach to the specific requirements of each agency ensures a smooth review process and enhances submission success rates.
Data Quality Checks for SDTM Datasets: FDA vs. PMDA
Data Quality Checks for SDTM Datasets: FDA vs. PMDA
Understanding Regulatory Requirements for Submission Success
Introduction
Submitting SDTM datasets to regulatory authorities like the FDA (U.S. Food and Drug Administration) and PMDA (Japan's Pharmaceuticals and Medical Devices Agency) involves rigorous data quality checks.
While both agencies adhere to CDISC standards, their submission guidelines and expectations differ in certain aspects. This blog explores the key differences in data quality checks for FDA and PMDA submissions.
Similarities in Data Quality Checks
Both FDA and PMDA share several common expectations for SDTM datasets:
Adherence to CDISC Standards: Both agencies require compliance with the SDTM Implementation Guide (SDTM-IG).
Controlled Terminology (CT): Variables such as AEDECOD and LBTESTCD must align with CDISC CT.
Traceability: Ensures that derived datasets and analysis results can be traced back to the raw data.
Define.xml Validation: Both agencies expect a complete and validated Define.xml file for metadata documentation.
Differences in Data Quality Checks
The FDA and PMDA have distinct preferences and requirements that need careful attention.
Aspect
FDA
PMDA
Validation Tools
Primarily uses Pinnacle 21 Community or Enterprise for validation.
Emphasis on "Reject" and "Error" findings.
Relies on Pinnacle 21, but PMDA-specific validation rules are stricter.
Additional checks on Japanese language and character encoding (e.g., UTF-8).
Validation Rules
Focuses on U.S.-specific regulatory rules.
Requires adherence to SDTM-IG versions commonly used in the U.S.
Requires alignment with Japanese-specific validation rules.
More emphasis on Trial Summary (TS) and demographic consistency.
Trial Summary (TS) Domain
FDA expects a complete TS domain but is less stringent on content beyond mandatory fields.
PMDA places greater importance on the TS domain, especially for regulatory codes specific to Japan.
Japanese Subjects
Less emphasis on Japanese-specific requirements.
Requires additional checks for Japanese subjects, such as proper handling of kanji characters.
Practical Tips for Submission Readiness
FDA Submissions: Focus on Pinnacle 21 validation findings marked as "Reject" and address controlled terminology issues.
PMDA Submissions: Pay extra attention to Japanese language and encoding. Validate the TS domain rigorously to include PMDA-specific codes.
Cross-validation: Run your datasets through both FDA and PMDA validation rulesets to ensure global compliance.
Conclusion
While FDA and PMDA share a common foundation in CDISC standards, their data quality expectations have nuanced differences. Understanding these distinctions is critical for ensuring smooth submissions.
By tailoring your SDTM programming and validation processes to address these unique requirements, you can enhance your submission success rate and streamline regulatory review.
Advanced SDTM Programming Tips
Advanced SDTM Programming Tips
Streamline Your SDTM Development with Expert Techniques
Tip 1: Automating SUPPQUAL Domain Creation
The SUPPQUAL (Supplemental Qualifiers) domain can be automated using SAS macros to handle additional variables in a systematic way.
Refer to the macro example provided earlier to simplify your SUPPQUAL generation process.
Tip 2: Handling Date Imputation
Many SDTM domains require complete dates, but raw data often contains partial or missing dates. Use the following code snippet for date imputation:
data imputed_dates;
set raw_data;
/* Impute missing day to the first day of the month */
if length(strip(date)) = 7 then date = cats(date, '-01');
/* Impute missing month and day to January 1st */
else if length(strip(date)) = 4 then date = cats(date, '-01-01');
format date yymmdd10.;
run;
Tip: Always document the imputation logic and ensure it aligns with the study protocol.
Tip 3: Dynamic Variable Label Assignment
Avoid hardcoding labels when creating SDTM domains. Use metadata-driven programming for consistency:
data AE;
set raw_ae;
attrib
AESTDTC label="Start Date/Time of Adverse Event"
AEENDTC label="End Date/Time of Adverse Event";
run;
Tip: Store labels in a metadata file (e.g., Excel or CSV) and read them dynamically in your program.
Tip 4: Efficient Use of Pinnacle 21 Outputs
Pinnacle 21 validation reports can be overwhelming. Focus on the following key areas:
Major Errors: Address structural and required variable issues first.
Traceability: Ensure SUPPQUAL variables and parent records are linked correctly.
Controlled Terminology: Verify values against the CDISC CT library to avoid deviations.
Tip: Use Excel formulas or conditional formatting to prioritize findings in Pinnacle 21 reports.
Tip 5: Debugging Complex Mapping Issues
When debugging mapping logic, use PUTLOG statements strategically:
data SDTM_AE;
set raw_ae;
if missing(AEDECOD) then putlog "WARNING: Missing AEDECOD for USUBJID=" USUBJID;
run;
Tip: Use PUTLOG with conditions to reduce unnecessary log clutter.
Tip 6: Mapping RELREC Domain
The RELREC domain is used to define relationships between datasets. Automate its creation using a data-driven approach:
Tip: Use PROC DATASETS to modify attributes like labels, formats, and lengths without rewriting the dataset.
Tip 8: Deriving Epoch Variables
EPOCH is a critical variable in SDTM domains, representing the study period during which an event occurred. Automate its derivation as follows:
data AE;
set AE;
if AESTDTC >= TRTSDTC and AESTDTC <= TRTEDTC then EPOCH = "TREATMENT";
else if AESTDTC < TRTSDTC then EPOCH = "SCREENING";
else if AESTDTC > TRTEDTC then EPOCH = "FOLLOW-UP";
run;
Tip: Ensure EPOCH values are consistent with the study design and align with other SDTM domains like EX and SV.
Tip 9: Validating VISITNUM and VISIT Variables
VISITNUM and VISIT are critical for aligning events with planned visits. Use a reference table for consistency:
proc sql;
create table validated_data as
select a.*, b.VISIT
from raw_data a
left join visit_reference b
on a.VISITNUM = b.VISITNUM;
quit;
Tip: Cross-check derived VISITNUM and VISIT values against the Trial Design domains (e.g., TV and TA).
Tip 10: Generating Define.XML Annotations
Define.XML is a crucial deliverable for SDTM datasets. Use metadata to dynamically create annotations:
Optimize Your SDTM Workflows with Efficient Automation Techniques
Introduction to SUPPQUAL Automation
The SUPPQUAL (Supplemental Qualifiers) domain is used to store additional information that cannot fit within a standard SDTM domain.
Manually creating the SUPPQUAL domain can be time-consuming and error-prone, especially for large datasets. In this article, we’ll explore an advanced tip to automate its creation using SAS macros.
Use Case: Adding Supplemental Qualifiers to a Domain
Imagine you have an SDTM AE domain (Adverse Events) and need to capture additional details like the investigator’s comments or assessment methods that are not part of the standard AE domain.
Code Example: Automating SUPPQUAL Domain
/* Macro to Create SUPPQUAL Domain */
%macro create_suppqual(domain=, idvar=, qnam_list=);
%let domain_upper = %upcase(&domain);
%let suppqual = SUPP&domain_upper;
data &suppqual;
set &domain;
length RDOMAIN $8 IDVAR $8 QNAM $8 QLABEL $40 QVAL $200;
array qvars{*} &qnam_list;
do i = 1 to dim(qvars);
if not missing(qvars{i}) then do;
RDOMAIN = "&domain_upper";
USUBJID = USUBJID;
IDVAR = "&idvar";
IDVARVAL = &idvar;
QNAM = vname(qvars{i});
QLABEL = put(QNAM, $40.);
QVAL = strip(put(qvars{i}, $200.));
output;
end;
end;
drop i &qnam_list;
run;
/* Sort SUPPQUAL for submission readiness */
proc sort data=&suppqual;
by USUBJID RDOMAIN IDVAR IDVARVAL QNAM;
run;
%mend;
/* Example Usage: Automating SUPPAE */
%create_suppqual(domain=AE, idvar=AETERM, qnam_list=AECOMMENT AEASSESS);
Explanation of the Code
RDOMAIN: Captures the parent domain name (e.g., AE).
array qvars{*}: Iterates through the list of supplemental qualifiers provided as macro parameters.
IDVAR: Represents the key variable in the parent domain (e.g., AETERM).
QLABEL: Automatically assigns a label to the qualifier variable.
QVAL: Stores the actual value of the supplemental qualifier.
Advantages of This Approach
Eliminates manual effort in creating SUPPQUAL domains.
Highly reusable and scalable across different domains.
Ensures consistency in handling supplemental qualifiers.
Pro Tip: Validation and Quality Control
Always validate the output SUPPQUAL dataset against CDISC compliance rules using tools like Pinnacle 21. Ensure that all required columns and relationships are correctly populated.
Unlock the Power of SAS for Efficient Data Manipulation
Introduction to HASH Objects
In SAS, HASH objects provide an efficient way to perform in-memory data lookups and merge operations, especially when dealing with large datasets.
Unlike traditional joins using PROC SQL or the MERGE statement, HASH objects can significantly reduce computational overhead.
Use Case: Matching and Merging Large Datasets
Suppose you have two datasets: a master dataset containing millions of records and a lookup dataset with unique key-value pairs.
The goal is to merge these datasets without compromising performance.
Code Example: Using HASH Objects
/* Define the master and lookup datasets */
data master;
input ID $ Value1 $ Value2 $;
datalines;
A001 X1 Y1
A002 X2 Y2
A003 X3 Y3
;
run;
data lookup;
input ID $ LookupValue $;
datalines;
A001 L1
A002 L2
A003 L3
;
run;
/* Use HASH object to merge datasets */
data merged;
if _n_ = 1 then do;
declare hash h(dataset: "lookup");
h.defineKey("ID");
h.defineData("LookupValue");
h.defineDone();
end;
set master;
if h.find() = 0 then output;
run;
/* Display the merged data */
proc print data=merged;
run;
Explanation of the Code
declare hash h: Creates a HASH object and loads the lookup dataset into memory.
h.defineKey: Specifies the key variable (ID) for the lookup.
h.defineData: Identifies the variable to retrieve from the lookup dataset.
h.find(): Searches for a match in the HASH object and retrieves the data if found.
Advantages of HASH Objects
Faster lookups compared to traditional joins, especially with large datasets.
In-memory operations reduce I/O overhead.
Provides greater flexibility for advanced operations.
Advanced SAS Programming Tip: Mastering Macro Variables
Advanced SAS Programming Tip: Mastering Macro Variables
Unleash the power of SAS with this advanced technique.
Introduction
Macro variables are a powerful tool in SAS that allow you to dynamically generate code. By understanding and effectively using macro variables, you can write more efficient and flexible SAS programs.
The Basics of Macro Variables
A macro variable is a placeholder that is replaced with its value during macro processing. You define a macro variable using the %LET statement and reference it using the %SYSFUNC or %SYSEVALF functions.
Advanced Techniques
1. Conditional Logic
You can use the %IF-%THEN-%ELSE statements to create conditional logic within your macro code. This allows you to dynamically generate code based on specific conditions.
2. Iterative Processing
The %DO loop can be used to iterate over a range of values or a list of items. This is useful for repetitive tasks, such as generating multiple datasets or reports.
3. Custom Macro Functions
You can create your own custom macro functions to encapsulate complex logic and reuse it throughout your code. This can help to improve code readability and maintainability.
Example: Dynamically Generating SQL Queries
Here's a simple example of how to use macro variables to dynamically generate SQL queries:
```sas
%let table_name = my_data;
%let where_clause = age > 30;
proc sql;
select *
from &table_name
where &where_clause;
quit;
```
Conclusion
By mastering macro variables, you can take your SAS programming skills to the next level. Experiment with these techniques to create more powerful and efficient SAS programs.
In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers.
My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.