StudySAS Blog: Mastering Clinical Data Management with SAS

Welcome to StudySAS, your ultimate guide to clinical data management using SAS. We cover essential topics like SDTM, CDISC standards, and Define.XML, alongside advanced PROC SQL and SAS Macros techniques. Whether you're enhancing your programming efficiency or ensuring compliance with industry standards, StudySAS offers practical tips and insights to elevate your clinical research expertise. Join us and stay ahead in the evolving world of clinical data.

Advanced SDTM Mapping Pitfalls and How to Avoid Them

Introduction

Mapping clinical data to SDTM domains is a complex process involving many technical and logical challenges. For experienced programmers, common issues often revolve around proper handling of controlled terminology, managing derived variables, ensuring consistency between domains, and maintaining relational integrity. This article explores some of the most common SDTM mapping pitfalls, with advanced solutions and SAS code examples, to help avoid regulatory submission errors.

Pitfall 1: Handling Derived Variables Incorrectly

One of the most common issues in SDTM mapping is incorrectly handling derived variables, which can lead to inaccuracies in key datasets such as EX (Exposure) and VS (Vital Signs).

Example:

A derived variable such as EXDUR (Exposure Duration) in the EX domain may not be properly calculated or may fail to align with the duration of the actual treatment.

Solution:

Use a SAS data step to derive exposure duration based on start and end dates (EXSTDTC and EXENDTC), ensuring proper ISO 8601 formatting and alignment with other datasets.


data ex_derived;
    set ex;
    /* Calculate the difference in days between EXSTDTC and EXENDTC */
    if not missing(EXSTDTC) and not missing(EXENDTC) then do;
        EXDUR = intck('day', input(EXSTDTC, yymmdd10.), input(EXENDTC, yymmdd10.));
    end;
    else EXDUR = .;
run;

proc print data=ex_derived;
    title "Derived EXDUR Based on Start and End Dates";
run;

Pitfall 2: Inconsistent Mapping of Medical History (MH) and Adverse Events (AE)

A common challenge is ensuring consistency between the AE (Adverse Events) and MH (Medical History) domains when both involve related clinical data. Programmers often struggle to map similar events across these domains, resulting in duplicated or inconsistent information.

Example:

A medical history event such as "Diabetes" may appear in both the MH and AE domains, but without proper mapping, this can create discrepancies.

Solution:

Create validation checks that ensure consistency between MH and AE, flagging records where conditions in MH appear without an associated AE or where the same event is inconsistently coded.


proc sql;
    create table check_ae_mh as 
    select a.USUBJID, a.MHTERM as Event, b.AETERM as AE_Event
    from mh as a
    left join ae as b
    on a.USUBJID = b.USUBJID and a.MHTERM = b.AETERM
    where missing(b.AETERM);
quit;

proc print data=check_ae_mh;
    title "Medical History Terms Without Corresponding Adverse Events";
run;

Pitfall 3: Issues with Handling Visit Windows

Assigning the correct visit window (such as VISITNUM, VISIT, or VISITDY) is critical in SDTM domains such as VS (Vital Signs), LB (Laboratory Tests), and EG (ECG). A common issue is assigning observations to the wrong visit window when the visit dates don’t match the planned visit windows.

Example:

If a patient visit happens on Day 15, but the expected window is Day 14, misalignment in visit windows can occur, leading to incorrect visit assignment.

Solution:

Create visit windowing rules using advanced date logic, ensuring that observations are correctly assigned to the proper visit window.


data vs_visit_window;
    set vs;
    if abs(VISITDY - planned_day) <= 2 then VISITWINDOW = 'On Time';
    else if VISITDY < planned_day then VISITWINDOW = 'Early';
    else VISITWINDOW = 'Late';
run;

proc print data=vs_visit_window;
    where VISITWINDOW ne 'On Time';
    title "Vital Signs with Early or Late Visits";
run;

Pitfall 4: Incorrect Handling of Controlled Terminology (CT)

Another common issue is failing to properly implement and regularly update controlled terminology for variables such as AEDECOD (Adverse Event Dictionary Term) or CMDECOD (Concomitant Medication Dictionary Term). Regulatory submissions often fail due to outdated or incorrect terminology.

Example:

Incorrectly mapped terms in the Concomitant Medications (CM) domain to outdated controlled terminology (e.g., WHO Drug) can result in a failed submission.

Solution:

Use a dynamic lookup to validate the controlled terms against the latest version of the dictionaries, flagging any mismatches or outdated terms.


proc sql;
    create table cm_term_check as 
    select a.*, b.whodrug_term
    from cm as a
    left join whodrug as b
    on a.CMDECOD = b.whodrug_term
    where missing(b.whodrug_term);
quit;

proc print data=cm_term_check;
    title "Mismatched Terms in Concomitant Medications Domain";
run;

Pitfall 5: Incorrect RELREC (Related Records) Relationships

Establishing correct relationships between records in different domains is essential when creating the RELREC domain. Misaligned or incomplete relationships can cause inconsistencies in your submission and prevent accurate linking between datasets.

Example:

A common issue is incorrectly linking the AE (Adverse Events) domain to the LB (Laboratory Tests) domain, where the related records aren't properly structured.

Solution:

Validate the relational integrity of records between AE and LB using the RDOMAIN and IDVARVAL variables in the RELREC domain.


proc sql;
    create table relrec_check as 
    select a.*, b.LBSEQ
    from relrec as a
    left join lb as b
    on a.USUBJID = b.USUBJID and a.IDVARVAL = b.LBSEQ
    where missing(b.LBSEQ);
quit;

proc print data=relrec_check;
    title "Mismatched RELREC Relationships Between AE and LB Domains";
run;

Pitfall 6: Handling Variables in Supplemental Qualifiers (SUPPxx)

Supplemental qualifiers (SUPPxx) domains are often used for storing non-standard variables, but incorrect or incomplete linking between the main domain and the SUPPxx domain can lead to issues during submission.

Example:

In the AE domain, supplemental qualifiers such as additional severity information may not be properly linked, causing discrepancies.

Solution:

Check for missing or improperly linked records between AE and SUPPAE domains, ensuring that each supplemental qualifier is accurately referenced.


proc sql;
    create table suppae_check as 
    select a.*, b.AESEQ
    from suppae as a
    left join ae as b
    on a.USUBJID = b.USUBJID and a.IDVARVAL = b.AESEQ
    where missing(b.AESEQ);
quit;

proc print data=suppae_check;
    title "Unlinked SUPP Qualifiers in AE Domain";
run;

Conclusion

Addressing these advanced SDTM mapping pitfalls requires a combination of regular validation, correct handling of controlled terminology, and careful management of relationships across datasets. By implementing automated SAS routines, you can streamline the process and ensure compliance with regulatory standards, preventing costly delays in submission.

Advanced SDTM Programming Techniques for SAS Programmers

As an experienced SAS programmer working with the Study Data Tabulation Model (SDTM), it's crucial to stay updated with the latest programming techniques. Whether you're tasked with building SDTM domains from scratch or optimizing existing code, there are several advanced concepts that can improve your workflows and output. In this post, we’ll explore some techniques that can help you overcome common SDTM challenges and boost efficiency in handling clinical trial data.

1. Efficient Handling of Large Datasets

When dealing with large clinical datasets, speed and efficiency are key. One method to optimize SDTM domain generation is to reduce the data footprint by eliminating unnecessary variables and duplicative observations. Consider the following approaches:

Removing Duplicate Observations

Duplicate records can slow down the processing of datasets and cause inaccuracies in reporting. To remove duplicates, you can use the PROC SQL, DATA STEP, or PROC SORT methods. Here's a quick example using PROC SORT:

proc sort data=mydata nodupkey;
    by usubjid visitnum;
run;

This example ensures that only unique records for each usubjid and visitnum combination are retained, eliminating potential redundancy in your dataset.

2. Mastering Macro Variables for Flexibility

Utilizing macro variables efficiently can significantly improve your code's flexibility, especially when working across multiple SDTM domains. Macro variables allow you to automate repetitive tasks and reduce the risk of human error. Here’s an example using macro variables to generate domain-specific reports:

%macro create_sdtm_report(domain);
    data &domain._report;
        set sdtm.&domain.;
        /* Apply domain-specific transformations */
    run;
%mend;

%create_sdtm_report(DM);
%create_sdtm_report(LB);

In this case, the macro dynamically generates SDTM reports for any domain by passing the domain name as a parameter, minimizing manual interventions.

3. Managing Demographics with the DC Domain

The Demographics as Collected (DC) domain often presents unique challenges, particularly when distinguishing it from the standard Demographics (DM) domain. While DM represents standardized data, DC focuses on the raw, collected demographic details. Here's an approach to manage these domains efficiently:

data dc_domain;
    set raw_data;
    /* Capture specific collected demographic data */
    where not missing(collected_age) and not missing(collected_gender);
run;

In this case, the code filters out any missing collected data to ensure the DC domain contains only records with complete demographic information.

4. Debugging SDTM Code with PUTLOG

Efficient debugging is crucial, especially when dealing with complex SDTM transformations. The PUTLOG statement in SAS is a simple yet powerful tool for tracking errors and debugging data issues.

data check;
    set sdtm.dm;
    if missing(usubjid) then putlog "ERROR: Missing USUBJID at obs " _n_;
run;

In this example, the PUTLOG statement flags records where the USUBJID variable is missing, making it easier to spot and address data quality issues during SDTM creation.

5. Advanced Array Processing for Repeated Measures

In certain domains like Vital Signs (VS) or Lab (LB), handling repeated measures for patients across multiple visits is common. Using arrays can help streamline this process. Here’s a basic example of using an array to process repeated lab measurements:

data lab_repeated;
    set sdtm.lb;
    array lb_vals{3} lbtestcd1-lbtestcd3;
    do i=1 to dim(lb_vals);
        if lb_vals{i} = . then putlog "WARNING: Missing value for LBTESTCD at " _n_;
    end;
run;

This code uses an array to loop through repeated lab test results, ensuring that missing values are flagged for review. Such array-based techniques are essential when processing large, multidimensional datasets in SDTM programming.

6. Best Practices for CDISC Validation and Compliance

To ensure SDTM datasets are CDISC compliant, it’s vital to validate your datasets using tools like Pinnacle 21 or OpenCDISC. These tools check compliance against SDTM standards, flagging any inconsistencies or issues.

Make sure to incorporate validation steps into your workflows regularly. This can be done by running the validation tool after each major dataset creation and by including clear annotations in your programming code to ensure traceability for audits.

Conclusion

Advanced SDTM programming requires a mix of technical expertise and strategic thinking. Whether you are optimizing large datasets, automating repetitive tasks with macros, or ensuring CDISC compliance, staying updated with these advanced techniques will enhance your efficiency and ensure high-quality deliverables in clinical trials. Remember, SDTM programming is not just about writing code—it's about delivering accurate, compliant, and reliable data for critical decision-making.

For more tips and tutorials, check out other PharmaSUG resources or SAS support!

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.

Discover More Tips and Techniques on This Blog

Advanced SDTM Mapping Pitfalls and How to Avoid Them

Introduction

Pitfall 1: Handling Derived Variables Incorrectly

Example:

Solution:

Pitfall 2: Inconsistent Mapping of Medical History (MH) and Adverse Events (AE)

Example:

Solution:

Pitfall 3: Issues with Handling Visit Windows

Example:

Solution:

Pitfall 4: Incorrect Handling of Controlled Terminology (CT)

Example:

Solution:

Pitfall 5: Incorrect RELREC (Related Records) Relationships

Example:

Solution:

Pitfall 6: Handling Variables in Supplemental Qualifiers (SUPPxx)

Example:

Solution:

Conclusion

Advanced SDTM Programming Techniques for SAS Programmers

1. Efficient Handling of Large Datasets

Removing Duplicate Observations

2. Mastering Macro Variables for Flexibility

3. Managing Demographics with the DC Domain

4. Debugging SDTM Code with PUTLOG

5. Advanced Array Processing for Repeated Measures

6. Best Practices for CDISC Validation and Compliance

Conclusion

Disclosure: