Developing the DC (Demographics as Collected) SDTM Domain: Tips, Techniques, Challenges, and Best Practices
Introduction
The DC (Demographics as Collected) domain is a specialized SDTM (Study Data Tabulation Model) domain designed to capture demographic data as it was originally collected in clinical trials. This domain is particularly valuable in studies that involve multiple screenings or re-enrollments, where maintaining the integrity of the collected data is crucial. Unlike the DM (Demographics) domain, which provides a standardized summary of demographic data, the DC domain preserves the raw, unstandardized data, ensuring accurate data representation.
This article delves into the key aspects of developing the DC domain, including the challenges, best practices, and the critical differences between the DC and DM domains. We also integrate insights from industry papers and guidelines, including an in-depth look at the FDA's recommendations on handling multiple screenings, the implications of SDTMIG 3.3 DM assumptions, and linking SUBJID to USUBJID, to provide a comprehensive guide.
Understanding the DC Domain Structure
The DC domain captures raw demographic data with variables such as:
- DCSEQ: Sequence Number
- DCTESTCD: Collected Test Code
- DCTEST: Collected Test Name
- DCORRES: Original Result
- DCORRESU: Original Units
- DCSTRESC: Standardized Result in Character
- DCSTRESN: Standardized Result in Numeric
- DCSTRESU: Standardized Units
- VISITNUM: Visit Number
- VISIT: Visit Name
- DCDTC: Date/Time of Collection
Challenges in Developing the DC Domain
1. Handling Rescreened Subjects
Challenge: Allowing subjects to undergo multiple screenings or re-enrollments in a study can complicate data management, especially when deciding how to represent each instance of participation.
Solution: The FDA recommends that the primary enrollment should be included in the DM domain, while additional screenings or enrollments should be recorded in a custom domain with a structure similar to DM, such as the DC domain. The DC domain should capture each screening or enrollment instance using unique subject identifiers (SUBJID), while maintaining a single unique subject identifier (USUBJID) across all instances.
2. Inconsistent Data Collection Methods
Challenge: Variations in data collection methods across sites or time points can lead to inconsistencies in the collected data.
Solution: Implement standardized protocols and training across sites to ensure consistency. The DC domain can capture these variations without standardization, preserving the raw data for future analysis.
3. Mapping Raw Data to SDTM Variables
Challenge: Accurately mapping raw demographic data to SDTM variables can be complex, particularly when data is collected in non-standard formats.
Solution: Utilize automated mapping tools and validate mappings through manual review. The DC domain should capture the data as collected, with minimal transformations.
4. Managing Validation Issues
Challenge: Custom domains like DC may trigger validation warnings in SDTM, especially when SUBJID is used across multiple domains.
Solution: Document the rationale for using the DC domain in the clinical Study Data Reviewer’s Guide (cSDRG) and prepare to address any validation warnings with clear explanations.
FDA Recommendations on Multiple Screenings
- Inclusion in DM and Custom Domains: The FDA recommends that the DM domain should include only the primary screening or enrollment of a subject. If a subject undergoes multiple screenings or enrollments, the primary instance should be captured in the DM domain, while additional screenings or enrollments should be included in a custom domain like the DC domain, which has a similar structure to DM.
- Handling Screen Failures: For subjects who fail the initial screening and are subsequently rescreened, the primary screening failure should be included in the DM domain, while the rescreening attempts are recorded in the DC domain. This ensures that all screening attempts are documented and available for analysis, while maintaining a clear distinction between successful enrollments and failures.
- Use of SUBJID Across Domains: The FDA also recommends using the SUBJID variable in related domains beyond DM, even if this causes validation warnings. This approach is crucial for linking all participation instances of a subject, especially in cases of multiple screenings or enrollments. It allows for a comprehensive view of the subject's participation history within the study.
- Alignment with Global Standards: The FDA's recommendations may differ from those of other regulatory bodies, such as the Japan Pharmaceuticals and Medical Devices Agency (PMDA) and the China National Medical Products Administration (NMPA). This discrepancy can present challenges when preparing submissions for multiple regulatory authorities. In such cases, careful documentation and clear communication with the relevant regulatory bodies are essential.
SDTMIG 3.3 DM Assumptions
- Primary Enrollment in DM: The DM domain should include only the primary enrollment of a subject. For subjects with multiple enrollments, additional records should be included in a custom domain, such as DC.
-
RFICDTC Correspondence: The variable
RFICDTC
(Date/Time of Informed Consent) in the DM domain should correspond to the date of the first informed consent protocol milestone recorded in the DS domain. If there are multiple informed consents, the first one is used in DM. -
RFXSTDTC and RFXENDTC Usage: The variables
RFXSTDTC
(Date/Time of First Study Treatment) andRFXENDTC
(Date/Time of Last Study Treatment) represent the date/time of the first and last study exposure, respectively. These are used in the DM domain to accurately reflect the subject’s exposure timeline. - Handling Multiple Screenings: For subjects who undergo multiple screenings but are not subsequently enrolled, the primary screening should be included in DM, with additional screenings captured in a custom domain like DC. This approach ensures that DM reflects only the most relevant participation instance.
Contrast Between DC and DM Domains
Understanding the distinction between the DC and DM domains is crucial for correctly mapping data:
-
DC Domain (Demographics as Collected):
- Purpose: Captures demographic data exactly as it was collected, without standardization or imputation. It is particularly useful for studies involving multiple screenings or enrollments.
- Data Types: Raw, unprocessed data that reflects the original data entry, including all collected demographic characteristics such as age, sex, race, and ethnicity.
- Example: If age was collected as "45" and sex as "M," these values would be recorded exactly as they are in the DC domain, with appropriate units and codes.
-
DM Domain (Demographics):
- Purpose: Provides a standardized, baseline snapshot of demographic data for each subject, used in analysis and reporting. The DM domain is typically a derived subset of the DC domain.
- Data Types: Standardized data, often derived or transformed from raw data. It may include derived variables such as age calculated at the screening date, or standardized values for sex and race.
- Example: In the DM domain, the age might be presented as "45" calculated based on a reference date, and sex might be converted to "Male" using controlled terminology.
Variable | DC Domain | DM Domain |
---|---|---|
Age | DCORRES = "45", DCORRESU = "Years" |
AGE = 45 (derived from birthdate) |
Sex | DCORRES = "M" |
SEX = "Male" |
Race | DCORRES = "Caucasian" |
RACE = "White" |
Visit Name | VISIT = "Screening" |
Not applicable |
Best Practices for Developing the DC Domain
- Ensure Accurate Mapping of Source Data: Validate that raw data is accurately mapped to DC domain variables, paying particular attention to variable types and units.
- Use Controlled Terminology Where Applicable: Ensure
DCTESTCD
andDCTEST
align with CDISC controlled terminology. If terms are missing or ambiguous, document any decisions made. - Handle Missing Data Appropriately: Follow SDTM conventions for representing missing data. Document any assumptions or imputations made in the process.
- Implement Proper Version Control: Track changes to the DC domain throughout the study with clear versioning and documentation.
- Visualize Data with Tables and Graphs: Use tools like SAS to visualize demographic data, allowing for easier identification of errors and outliers.
- Validate the DC Domain: Regularly validate your domain using tools like Pinnacle 21 and manual checks to ensure compliance with SDTM standards.
- Document Everything: Maintain thorough documentation for every step, from data collection to final SDTM mapping.
Conclusion
The development of the DC domain is not just a routine task—it is a critical step in ensuring the integrity and accuracy of your study’s demographic data. By understanding the challenges and differences between the DC and DM domains, and by implementing the tips and techniques discussed, you can ensure that your DC domain is accurate, compliant, and ready for submission.
Next Steps:
- Assess your current processes for developing the DC domain.
- Implement the strategies outlined to enhance accuracy and consistency.
- Train your team on the distinctions between the DC and DM domains to avoid common pitfalls.
References
- Matta, V., Jajam, S., & Peddibhotla, L. (2021). Rescreened Subjects, Data Collection and Standard Domains Mapping. Covance Inc.
- Zhou, X., Xie, L., Hu, Q., & Ma, S. (2023). Exploration on Demographic as Collected (DC) Domain to Handle Multiple Screenings in SDTM. BeiGene, Inc.