Advanced SDTM Mapping Pitfalls and How to Avoid Them
Introduction
Mapping clinical data to SDTM domains is a complex process involving many technical and logical challenges. For experienced programmers, common issues often revolve around proper handling of controlled terminology, managing derived variables, ensuring consistency between domains, and maintaining relational integrity. This article explores some of the most common SDTM mapping pitfalls, with advanced solutions and SAS code examples, to help avoid regulatory submission errors.
Pitfall 1: Handling Derived Variables Incorrectly
One of the most common issues in SDTM mapping is incorrectly handling derived variables, which can lead to inaccuracies in key datasets such as EX (Exposure) and VS (Vital Signs).
Example:
A derived variable such as EXDUR
(Exposure Duration) in the EX domain may not be properly calculated or may fail to align with the duration of the actual treatment.
Solution:
Use a SAS data step to derive exposure duration based on start and end dates (EXSTDTC
and EXENDTC
), ensuring proper ISO 8601 formatting and alignment with other datasets.
data ex_derived;
set ex;
/* Calculate the difference in days between EXSTDTC and EXENDTC */
if not missing(EXSTDTC) and not missing(EXENDTC) then do;
EXDUR = intck('day', input(EXSTDTC, yymmdd10.), input(EXENDTC, yymmdd10.));
end;
else EXDUR = .;
run;
proc print data=ex_derived;
title "Derived EXDUR Based on Start and End Dates";
run;
Pitfall 2: Inconsistent Mapping of Medical History (MH) and Adverse Events (AE)
A common challenge is ensuring consistency between the AE (Adverse Events) and MH (Medical History) domains when both involve related clinical data. Programmers often struggle to map similar events across these domains, resulting in duplicated or inconsistent information.
Example:
A medical history event such as "Diabetes" may appear in both the MH and AE domains, but without proper mapping, this can create discrepancies.
Solution:
Create validation checks that ensure consistency between MH and AE, flagging records where conditions in MH appear without an associated AE or where the same event is inconsistently coded.
proc sql;
create table check_ae_mh as
select a.USUBJID, a.MHTERM as Event, b.AETERM as AE_Event
from mh as a
left join ae as b
on a.USUBJID = b.USUBJID and a.MHTERM = b.AETERM
where missing(b.AETERM);
quit;
proc print data=check_ae_mh;
title "Medical History Terms Without Corresponding Adverse Events";
run;
Pitfall 3: Issues with Handling Visit Windows
Assigning the correct visit window (such as VISITNUM
, VISIT
, or VISITDY
) is critical in SDTM domains such as VS (Vital Signs), LB (Laboratory Tests), and EG (ECG). A common issue is assigning observations to the wrong visit window when the visit dates don’t match the planned visit windows.
Example:
If a patient visit happens on Day 15, but the expected window is Day 14, misalignment in visit windows can occur, leading to incorrect visit assignment.
Solution:
Create visit windowing rules using advanced date logic, ensuring that observations are correctly assigned to the proper visit window.
data vs_visit_window;
set vs;
if abs(VISITDY - planned_day) <= 2 then VISITWINDOW = 'On Time';
else if VISITDY < planned_day then VISITWINDOW = 'Early';
else VISITWINDOW = 'Late';
run;
proc print data=vs_visit_window;
where VISITWINDOW ne 'On Time';
title "Vital Signs with Early or Late Visits";
run;
Pitfall 4: Incorrect Handling of Controlled Terminology (CT)
Another common issue is failing to properly implement and regularly update controlled terminology for variables such as AEDECOD
(Adverse Event Dictionary Term) or CMDECOD
(Concomitant Medication Dictionary Term). Regulatory submissions often fail due to outdated or incorrect terminology.
Example:
Incorrectly mapped terms in the Concomitant Medications (CM) domain to outdated controlled terminology (e.g., WHO Drug) can result in a failed submission.
Solution:
Use a dynamic lookup to validate the controlled terms against the latest version of the dictionaries, flagging any mismatches or outdated terms.
proc sql;
create table cm_term_check as
select a.*, b.whodrug_term
from cm as a
left join whodrug as b
on a.CMDECOD = b.whodrug_term
where missing(b.whodrug_term);
quit;
proc print data=cm_term_check;
title "Mismatched Terms in Concomitant Medications Domain";
run;
Pitfall 5: Incorrect RELREC (Related Records) Relationships
Establishing correct relationships between records in different domains is essential when creating the RELREC domain. Misaligned or incomplete relationships can cause inconsistencies in your submission and prevent accurate linking between datasets.
Example:
A common issue is incorrectly linking the AE (Adverse Events) domain to the LB (Laboratory Tests) domain, where the related records aren't properly structured.
Solution:
Validate the relational integrity of records between AE and LB using the RDOMAIN
and IDVARVAL
variables in the RELREC domain.
proc sql;
create table relrec_check as
select a.*, b.LBSEQ
from relrec as a
left join lb as b
on a.USUBJID = b.USUBJID and a.IDVARVAL = b.LBSEQ
where missing(b.LBSEQ);
quit;
proc print data=relrec_check;
title "Mismatched RELREC Relationships Between AE and LB Domains";
run;
Pitfall 6: Handling Variables in Supplemental Qualifiers (SUPPxx)
Supplemental qualifiers (SUPPxx) domains are often used for storing non-standard variables, but incorrect or incomplete linking between the main domain and the SUPPxx domain can lead to issues during submission.
Example:
In the AE domain, supplemental qualifiers such as additional severity information may not be properly linked, causing discrepancies.
Solution:
Check for missing or improperly linked records between AE and SUPPAE domains, ensuring that each supplemental qualifier is accurately referenced.
proc sql;
create table suppae_check as
select a.*, b.AESEQ
from suppae as a
left join ae as b
on a.USUBJID = b.USUBJID and a.IDVARVAL = b.AESEQ
where missing(b.AESEQ);
quit;
proc print data=suppae_check;
title "Unlinked SUPP Qualifiers in AE Domain";
run;
Conclusion
Addressing these advanced SDTM mapping pitfalls requires a combination of regular validation, correct handling of controlled terminology, and careful management of relationships across datasets. By implementing automated SAS routines, you can streamline the process and ensure compliance with regulatory standards, preventing costly delays in submission.