Cross-domain SDTM QC Checks You Should Automate, With SAS Snippets
Most SDTM QC still stops at domain‑level review and Pinnacle 21 output.
That leaves a big gap.
A study can be structurally clean and still fail basic cross‑domain logic. AE timing can conflict with EX. Death can exist in DM without a matching DS record. RFSTDTC can disagree with the earliest exposure date. None of that is rare. None of it should be left to manual review.
If you want stronger SDTM, automate the checks that validate how domains work together, not just whether each domain looks correct in isolation.
These are not meant to replace protocol review, medical review, or P21. They are meant to catch the quiet, cross‑domain failures that sit between them.
Why Cross-domain QC Matters
P21 validates conformance.
Cross‑domain QC validates coherence.
That is the difference between:
- a dataset that follows SDTM rules
- and a dataset that actually represents the study correctly
The most expensive issues in submission work usually live in that second category.
A Good Rule Before Writing Any Check
- Chronology: Did events happen in a clinically possible order?
- Consistency: Do anchor variables agree across domains?
- Completeness: Is a required partner record present somewhere else?
- Traceability: Does a claimed relationship actually resolve?
If a check does not answer one of those questions, it is probably better handled elsewhere.
1. AE Start Date Before First Exposure
This is one of the most useful cross‑domain checks. An AE that starts before first exposure is not always wrong — it may reflect pre‑treatment conditions or prior medical history that the protocol is capturing — but it should always be flagged and reviewed.
proc sql;
create table ex_first as
select usubjid,
min(input(exstdtc, ?? is8601da.)) as first_exdt format=date9.
from sdtm.ex
where not missing(exstdtc)
group by usubjid;
quit;
proc sql;
create table qc_ae_before_ex as
select a.usubjid, a.aeseq, a.aestdtc, e.first_exdt
from sdtm.ae as a
left join ex_first as e
on a.usubjid = e.usubjid
where input(a.aestdtc, ?? is8601da.) < e.first_exdt;
quit;
Issues that show up here often include bad date mapping, wrong reference dates, or AE‑related timing logic that clashes with protocol‑defined treatment‑emergent assumptions.
2. DM RFSTDTC vs Earliest EXSTDTC
If RFSTDTC is the date of first study treatment, it should match the earliest exposure date in EX for that subject. When they disagree, every downstream day‑based calculation built from RFSTDTC becomes suspect.
proc sql;
create table ex_first as
select usubjid,
min(input(exstdtc, ?? is8601da.)) as first_exdt format=date9.
from sdtm.ex
where not missing(exstdtc)
group by usubjid;
quit;
data qc_dm_rfstdtc_mismatch;
set sdtm.dm;
rfstdt = input(rfstdtc, ?? is8601da.);
run;
proc sql;
create table qc_dm_rfstdtc_mismatch as
select d.usubjid, d.rfstdtc, e.first_exdt format=date9.
from qc_dm_rfstdtc_mismatch as d
left join ex_first as e
on d.usubjid = e.usubjid
where not missing(d.rfstdt) and not missing(e.first_exdt)
and d.rfstdt ne e.first_exdt;
quit;
3. DM Death Without DS Record
If DTHFL = "Y" or DTHDTC is populated in DM, there should usually be a corresponding DS record that reflects the subject’s disposition tied to death. A mismatch undermines the death-disposition narrative for the reviewer.
proc sql;
create table qc_dm_death_no_ds as
select d.usubjid
from sdtm.dm as d
left join sdtm.ds as s
on d.usubjid = s.usubjid
where upcase(d.dthfl) = "Y"
and missing(s.usubjid);
quit;
4. LB Outside Study Window
Lab records can look fine in isolation, but they become problematic when placed against the subject’s actual study window from DM (RFSTDTC to RFENDTC). This check catches labs that are genuinely out‑of‑window, mis‑mapped, or pulled into SDTM by mistake.
proc sql;
create table dm_window as
select usubjid,
input(rfstdtc, ?? is8601da.) as rfstdt format=date9.,
input(rfendtc, ?? is8601da.) as rfendt format=date9.
from sdtm.dm;
quit;
proc sql;
create table qc_lb_outside_window as
select l.usubjid, l.lbseq, l.lbdtc,
d.rfstdt, d.rfendt
from sdtm.lb as l
left join dm_window as d
on l.usubjid = d.usubjid
where not missing(l.lbdtc)
and (
input(l.lbdtc, ?? is8601da.) < d.rfstdt
or
(not missing(d.rfendt) and input(l.lbdtc, ?? is8601da.) > d.rfendt)
);
quit;
5. RELREC Link Resolution
A RELREC record is only useful if the referenced record actually exists. This is a classic cross‑domain traceability check. Broken RELREC links create false confidence rather than real traceability. This is one of the few checks where structure can be completely correct and still be functionally useless.
%macro check_relrec(domain=, idvar=);
proc sql;
create table qc_relrec_&domain as
select r.usubjid, r.rdomain, r.idvar, r.idvarval
from sdtm.relrec as r
left join sdtm.&domain as d
on r.usubjid = d.usubjid
and input(r.idvarval, best.) = d.&idvar
where upcase(r.rdomain) = "%upcase(&domain)"
and upcase(r.idvar) = "%upcase(&idvar)"
and missing(d.usubjid);
quit;
%mend;
%check_relrec(domain=ae, idvar=aeseq);
%check_relrec(domain=cm, idvar=cmseq);
6. VLM Coverage Gaps
If a value appears in the data but has no matching value-level metadata entry, that is a real documentation gap, even when the define.xml is technically valid.
proc sql;
create table suppae_qnam_data as
select distinct upcase(qnam) as qnam
from sdtm.suppae;
create table suppae_qnam_meta as
select distinct upcase(value) as qnam
from meta.vlm
where upcase(dataset) = "SUPPAE"
and upcase(variable) = "QNAM";
create table qc_suppae_qnam_vlm_gap as
select d.qnam
from suppae_qnam_data as d
left join suppae_qnam_meta as m
on d.qnam = m.qnam
where missing(m.qnam);
quit;
Building These Checks Into a Reusable SAS QC Framework
These checks are most useful when packaged as a shared macro library and QC findings layer used across studies, not as one‑off snippets in individual program folders.
You can build them as:
- standard macros (e.g.,
%check_relrec,%check_ae_ex_timing) - standard QC datasets (one per check)
- a standard reporting layer that appends all findings into a single QC findings dataset
Once that structure is in place, each new study can reuse the same logic with little customization.
What You Should Automate First
If you are starting from scratch, automate these first:
- DM‑to‑EX date anchor checks (RFSTDTC vs earliest EXSTDTC)
- AE‑to‑EX chronology (AE before first exposure)
- DM‑to‑DS death consistency
- LB‑to‑DM timeline (LB outside study window)
- RELREC link resolution
- VLM coverage for key variables
These give you the fastest return on investment and form the backbone of a cross‑domain QC framework.
What Automation Will Not Solve
- mapping correctness
- protocol interpretation
- clinical judgment
Automation cannot replace those decisions. It can, however, force them into the open early by surfacing the inconsistencies they create.
Bottom Line
Clean domains are not enough. The domains have to agree with each other.