Cross-domain SDTM QC Checks You Should Automate, With SAS Snippets

Most SDTM QC still stops at domain‑level review and Pinnacle 21 output.

That leaves a big gap.

A study can be structurally clean and still fail basic cross‑domain logic. AE timing can conflict with EX. Death can exist in DM without a matching DS record. RFSTDTC can disagree with the earliest exposure date. None of that is rare. None of it should be left to manual review.

If you want stronger SDTM, automate the checks that validate how domains work together, not just whether each domain looks correct in isolation.

These are not meant to replace protocol review, medical review, or P21. They are meant to catch the quiet, cross‑domain failures that sit between them.


Why Cross-domain QC Matters

P21 validates conformance.

Cross‑domain QC validates coherence.

That is the difference between:

  • a dataset that follows SDTM rules
  • and a dataset that actually represents the study correctly

The most expensive issues in submission work usually live in that second category.

These are not edge cases. This is where most real submission issues live.

A Good Rule Before Writing Any Check

  1. Chronology: Did events happen in a clinically possible order?
  2. Consistency: Do anchor variables agree across domains?
  3. Completeness: Is a required partner record present somewhere else?
  4. Traceability: Does a claimed relationship actually resolve?

If a check does not answer one of those questions, it is probably better handled elsewhere.


1. AE Start Date Before First Exposure

This is one of the most useful cross‑domain checks. An AE that starts before first exposure is not always wrong — it may reflect pre‑treatment conditions or prior medical history that the protocol is capturing — but it should always be flagged and reviewed.

proc sql;
  create table ex_first as
  select usubjid,
         min(input(exstdtc, ?? is8601da.)) as first_exdt format=date9.
  from sdtm.ex
  where not missing(exstdtc)
  group by usubjid;
quit;

proc sql;
  create table qc_ae_before_ex as
  select a.usubjid, a.aeseq, a.aestdtc, e.first_exdt
  from sdtm.ae as a
  left join ex_first as e
    on a.usubjid = e.usubjid
  where input(a.aestdtc, ?? is8601da.) < e.first_exdt;
quit;

Issues that show up here often include bad date mapping, wrong reference dates, or AE‑related timing logic that clashes with protocol‑defined treatment‑emergent assumptions.


2. DM RFSTDTC vs Earliest EXSTDTC

If RFSTDTC is the date of first study treatment, it should match the earliest exposure date in EX for that subject. When they disagree, every downstream day‑based calculation built from RFSTDTC becomes suspect.

proc sql;
  create table ex_first as
  select usubjid,
         min(input(exstdtc, ?? is8601da.)) as first_exdt format=date9.
  from sdtm.ex
  where not missing(exstdtc)
  group by usubjid;
quit;

data qc_dm_rfstdtc_mismatch;
  set sdtm.dm;
  rfstdt = input(rfstdtc, ?? is8601da.);
run;

proc sql;
  create table qc_dm_rfstdtc_mismatch as
  select d.usubjid, d.rfstdtc, e.first_exdt format=date9.
  from qc_dm_rfstdtc_mismatch as d
  left join ex_first as e
    on d.usubjid = e.usubjid
  where not missing(d.rfstdt) and not missing(e.first_exdt)
    and d.rfstdt ne e.first_exdt;
quit;

3. DM Death Without DS Record

If DTHFL = "Y" or DTHDTC is populated in DM, there should usually be a corresponding DS record that reflects the subject’s disposition tied to death. A mismatch undermines the death-disposition narrative for the reviewer.

proc sql;
  create table qc_dm_death_no_ds as
  select d.usubjid
  from sdtm.dm as d
  left join sdtm.ds as s
    on d.usubjid = s.usubjid
  where upcase(d.dthfl) = "Y"
    and missing(s.usubjid);
quit;

4. LB Outside Study Window

Lab records can look fine in isolation, but they become problematic when placed against the subject’s actual study window from DM (RFSTDTC to RFENDTC). This check catches labs that are genuinely out‑of‑window, mis‑mapped, or pulled into SDTM by mistake.

proc sql;
  create table dm_window as
  select usubjid,
         input(rfstdtc, ?? is8601da.) as rfstdt format=date9.,
         input(rfendtc, ?? is8601da.) as rfendt format=date9.
  from sdtm.dm;
quit;

proc sql;
  create table qc_lb_outside_window as
  select l.usubjid, l.lbseq, l.lbdtc,
         d.rfstdt, d.rfendt
  from sdtm.lb as l
  left join dm_window as d
    on l.usubjid = d.usubjid
  where not missing(l.lbdtc)
    and (
         input(l.lbdtc, ?? is8601da.) < d.rfstdt
         or
         (not missing(d.rfendt) and input(l.lbdtc, ?? is8601da.) > d.rfendt)
        );
quit;

5. RELREC Link Resolution

A RELREC record is only useful if the referenced record actually exists. This is a classic cross‑domain traceability check. Broken RELREC links create false confidence rather than real traceability. This is one of the few checks where structure can be completely correct and still be functionally useless.

%macro check_relrec(domain=, idvar=);
  proc sql;
    create table qc_relrec_&domain as
    select r.usubjid, r.rdomain, r.idvar, r.idvarval
    from sdtm.relrec as r
    left join sdtm.&domain as d
      on r.usubjid = d.usubjid
     and input(r.idvarval, best.) = d.&idvar
    where upcase(r.rdomain) = "%upcase(&domain)"
      and upcase(r.idvar) = "%upcase(&idvar)"
      and missing(d.usubjid);
  quit;
%mend;

%check_relrec(domain=ae, idvar=aeseq);
%check_relrec(domain=cm, idvar=cmseq);

6. VLM Coverage Gaps

If a value appears in the data but has no matching value-level metadata entry, that is a real documentation gap, even when the define.xml is technically valid.

proc sql;
  create table suppae_qnam_data as
  select distinct upcase(qnam) as qnam
  from sdtm.suppae;

  create table suppae_qnam_meta as
  select distinct upcase(value) as qnam
  from meta.vlm
  where upcase(dataset) = "SUPPAE"
    and upcase(variable) = "QNAM";

  create table qc_suppae_qnam_vlm_gap as
  select d.qnam
  from suppae_qnam_data as d
  left join suppae_qnam_meta as m
    on d.qnam = m.qnam
  where missing(m.qnam);
quit;

Building These Checks Into a Reusable SAS QC Framework

These checks are most useful when packaged as a shared macro library and QC findings layer used across studies, not as one‑off snippets in individual program folders.

You can build them as:

  • standard macros (e.g., %check_relrec, %check_ae_ex_timing)
  • standard QC datasets (one per check)
  • a standard reporting layer that appends all findings into a single QC findings dataset

Once that structure is in place, each new study can reuse the same logic with little customization.


What You Should Automate First

If you are starting from scratch, automate these first:

  1. DM‑to‑EX date anchor checks (RFSTDTC vs earliest EXSTDTC)
  2. AE‑to‑EX chronology (AE before first exposure)
  3. DM‑to‑DS death consistency
  4. LB‑to‑DM timeline (LB outside study window)
  5. RELREC link resolution
  6. VLM coverage for key variables

These give you the fastest return on investment and form the backbone of a cross‑domain QC framework.


What Automation Will Not Solve

  • mapping correctness
  • protocol interpretation
  • clinical judgment

Automation cannot replace those decisions. It can, however, force them into the open early by surfacing the inconsistencies they create.


Bottom Line

Clean domains are not enough. The domains have to agree with each other.

Cross‑domain QC is where SDTM moves from compliant to credible.
```