Discover More Tips and Techniques on This Blog

SDTM (Study Data Tabulation Model) programming is a crucial aspect of clinical trial data management, ensuring that data is standardized, traceable, and ready for regulatory submission. Below are some practical tips for SDTM programming, complete with specific examples and code snippets to help you manage your clinical data more efficiently and effectively.

1. Understand the SDTM Implementation Guide (IG)

The SDTM IG is your primary reference when working with SDTM datasets. It provides detailed guidelines on how to structure and standardize your data. Familiarize yourself with the requirements for each domain, including the use of controlled terminology, dataset structures, and relationships between domains.

Example: When creating the AE (Adverse Events) domain, ensure you include required variables like USUBJID, AEDECOD, AESTDTC, and AESEV. Reference the IG to determine how these variables should be populated and linked to other domains.

2. Use Controlled Terminology Consistently

Controlled terminology is essential for consistency across datasets. Always use the latest controlled terminology standards provided by CDISC. This includes coding variables like AEDECOD using MedDRA and CMDECOD (Concomitant Medication Dictionary-Derived Term) using WHO Drug.

Example: If coding a concomitant medication, ensure that CMTRT (reported term for the treatment) and CMDECOD are aligned with the WHO Drug dictionary to maintain consistency.


data cm;
   set raw_cm;
   length CMDECOD $40;
   if cmtrt = 'aspirin' then cmdecod = 'Aspirin';
   else if cmtrt = 'acetaminophen' then cmdecod = 'Acetaminophen';
   /* Additional coding logic here */
run;

3. Leverage the Power of SAS Macros

SAS macros can automate repetitive tasks and ensure consistency across datasets. For example, you can create a macro to standardize date variables across all domains or to generate common SUPPQUAL datasets.

Example: The macro below standardizes date variables across multiple domains, ensuring they are in ISO 8601 format.


%macro standardize_dates(data=, datevar=);
   data &data.;
      set &data.;
      format &datevar. yymmdd10.;
      &datevar. = input(put(&datevar., yymmdd10.), yymmdd10.);
   run;
%mend standardize_dates;

/* Example usage */
%standardize_dates(data=ae, datevar=aestdtc);
%standardize_dates(data=cm, datevar=cmstdtc);

4. Validate Your Data Early and Often

Validation is key to ensuring that your SDTM datasets are compliant with regulatory standards. Use tools like Pinnacle 21 to validate your datasets against the SDTM IG. Validate early in the process to catch errors before they become embedded in your datasets.

Example: Use Pinnacle 21 to run a validation report on your SDTM datasets to identify and correct issues such as missing variables, incorrect data types, or violations of controlled terminology.

5. Document Your Work Thoroughly

Good documentation is essential for traceability and reproducibility. Keep detailed records of your data transformations, including the source data, the steps taken to convert it into SDTM format, and any issues encountered.

Example: Use comments in your SAS code to document complex logic or assumptions. Also, ensure that your define.xml file includes all necessary metadata.


/* Example: Documenting a derived variable in AE domain */
data ae;
   set raw_ae;
   /* Deriving AE duration */
   aedur = aendt - aestdt;
   /* Assuming events with missing end date are ongoing */
   if missing(aendt) then aedur = .;
run;

6. Pay Attention to Date/Time Variables

Date and time variables can be tricky to handle, especially when dealing with partial or missing data. Always use ISO 8601 format (e.g., YYYY-MM-DD) for date and time variables. Be consistent in how you handle missing components and ensure that all date/time variables are correctly formatted.

Example: When creating the DM (Demographics) domain, ensure that birth date (BRTHDTC) and informed consent date (RFICDTC) are formatted according to ISO 8601 standards.


data dm;
   set raw_dm;
   format brthdtc rficdtc yymmdd10.;
   brthdtc = input(put(birth_date, yymmdd10.), yymmdd10.);
   rficdtc = input(put(consent_date, yymmdd10.), yymmdd10.);
run;

7. Use RELREC to Maintain Relationships Between Records

The RELREC domain is crucial for maintaining relationships between different datasets, such as linking adverse events with concomitant medications.

Example: To link an adverse event to a concomitant medication that was administered at the same time, you would use the RELREC domain.


data relrec;
   length USUBJID RDOMAIN IDVAR IDVARVAL RELTYPE RELID $20;
   set ae cm;
   if ae.usubjid = cm.usubjid and aestdtc = cmstdtc then do;
      rdomain = "AE";
      idvar = "AESEQ";
      idvarval = put(aeseq, best.);
      reltype = "ONE";
      relid = "AE_CM";
      output;

      rdomain = "CM";
      idvar = "CMSEQ";
      idvarval = put(cmseq, best.);
      reltype = "ONE";
      relid = "AE_CM";
      output;
   end;
run;

8. Handle SUPPQUAL Carefully

The SUPPQUAL domain is used for supplemental qualifiers that do not fit into the standard SDTM domains. Ensure that the IDVAR and IDVARVAL correctly reference the parent domain and that the supplemental data is necessary and compliant with the SDTM IG.

Example: The code below shows how to add a supplemental qualifier for the AE domain to capture toxicity grade information.


data suppae;
   set ae;
   if aetoxgr ne '' then do;
      rdomain = "AE";
      idvar = "AESEQ";
      idvarval = put(aeseq, best.);
      qnam = "AETOXGR";
      qlabel = "Toxicity Grade";
      qval = aetoxgr;
      output;
   end;
run;

9. Stay Updated with CDISC Guidelines

CDISC guidelines and controlled terminology are periodically updated. Make sure you stay informed about these updates and apply them to your SDTM datasets.

Example: Regularly check the CDISC website for updates on controlled terminology. Implement these updates in your SAS programs to ensure your datasets remain compliant.

10. Test Your Code with Small, Representative Datasets

Before applying your code to the full dataset, test it on a smaller, representative sample. This helps identify any potential issues without processing the entire dataset.

Example: Create a small dataset with representative data and run your SDTM conversion code on it. Verify that the output matches your expectations before processing the entire dataset.


data ae_sample;
   set ae(obs=100); /* Testing with the first 100 records */
run;

/* Apply your SDTM conversion code to ae_sample */

11. Use ODS for Creating Define.xml Files

The define.xml file is critical for your SDTM submission. SAS’s ODS (Output Delivery System) can be used to create the define.xml file, ensuring it is properly formatted and compliant with regulatory requirements.

Example: Use the following SAS code to generate the define.xml file, including all necessary metadata and controlled terminology references.


ods cdisc define file="define.xml" style=sasweb;
proc cdisc model=sdtm;
   define metadata="define_metadata.sas7bdat";
   read definedata metadata="define_metadata.sas7bdat";
   write;
run;
ods cdisc close;

12. Maintain Traceability from Raw Data to SDTM

Traceability is essential for demonstrating the accuracy and integrity of your SDTM datasets. Ensure that every variable in your SDTM datasets can be traced back to the original raw data.

Example: Document each transformation step in your SAS code and ensure it is clearly explained in the define.xml file.


/* Example: Documenting the derivation of visit number in VS domain */
data vs;
   set raw_vs;
   /* Deriving VISITNUM from raw visit name */
   if visit = 'Baseline' then visitnum = 1;
   else if visit = 'Week 1' then visitnum = 2;
   /* Additional visit logic here */
run;

/* Ensure that the define.xml file includes details about this derivation */

13. Manage Version Control Effectively

Use version control software like Git to track changes to your SDTM datasets and SAS programs. This allows you to revert to previous versions if necessary and ensures that you have a complete history of all changes made.

Example: Set up a Git repository for your SDTM project and commit changes regularly. Include clear commit messages that describe the changes made.


git init
git add .
git commit -m "Initial commit of SDTM conversion programs"
git commit -am "Updated AE domain to include new controlled terminology"

14. Optimize Performance for Large Datasets

When working with large datasets, performance can become an issue. Optimize your SAS code by using efficient data step and PROC SQL techniques.

Example: Minimize the number of data passes by combining multiple operations in a single data step or PROC SQL query. Avoid unnecessary sorting or merging.


proc sql;
   create table ae_final as
   select a.*, b.cmtrt
   from ae as a
   left join cm as b
   on a.usubjid = b.usubjid
   where a.aesev = 'SEVERE' and b.cmtrt = 'Steroid';
quit;

15. Collaborate with Team Members

SDTM programming is often a collaborative effort. Regularly communicate with your team members to ensure that everyone is aligned on the standards and processes being used.

Example: Use shared code repositories, regular meetings, and clear documentation to facilitate collaboration and ensure consistency across the team’s work.

16. Prepare for Audits

Regulatory audits can happen at any time, so it's important to be prepared. Ensure that all your datasets, programs, and documentation are organized and accessible.

Example: Regularly review your work to ensure compliance with SDTM standards. Create a checklist of key compliance points and review it before submitting any data.

17. Utilize PROC CDISC

PROC CDISC in SAS is a powerful tool for creating SDTM datasets that comply with CDISC standards. Familiarize yourself with PROC CDISC options and use it to validate and generate SDTM datasets efficiently.

Example: Use PROC CDISC to read and validate your SDTM datasets, ensuring that they meet the required standards.


proc cdisc model=sdtm;
   define metadata="define_metadata.sas7bdat";
   read definedata metadata="define_metadata.sas7bdat";
   write;
run;

18. Stay Organized with Project Files

Keep your project files well-organized by maintaining separate directories for raw data, SDTM datasets, programs, logs, and outputs.

Example: Use a clear directory structure like the one below to keep your files organized and easy to find:


/SDTM_Project
   /raw_data
   /sdtm_datasets
   /programs
   /logs
   /outputs

19. Understand the Importance of Domain-Specific Variables

Each SDTM domain has specific variables that must be included. Ensure that you understand the purpose of these variables and that they are correctly populated.

Example: In the LB (Laboratory) domain, ensure that variables like LBTESTCD, LBORRES, and LBNRIND (Reference Range Indicator) are accurately populated.


data lb;
   set raw_lb;
   if lbtestcd = 'GLUC' then do;
      lborres = glucose_value;
      lbunit = 'mg/dL';
      if lborres > 110 then lbnrind = 'HIGH';
      else if lborres < 70 then lbnrind = 'LOW';
      else lbnrind = 'NORMAL';
   end;
run;

20. Engage in Continuous Learning

The field of clinical data management is constantly evolving. Engage in continuous learning by attending webinars, participating in CDISC workshops, and networking with other SDTM programmers.

Example: Subscribe to newsletters from CDISC and SAS, attend industry conferences, and join professional organizations to stay informed about the latest trends and best practices in SDTM programming.

By following these SDTM programming tips, you can ensure that your clinical trial data is well-structured, compliant with regulatory standards, and ready for submission. Effective SDTM programming not only facilitates smoother regulatory review but also contributes to the overall success of the clinical trial.

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.