A Define.xml Review Checklist I Actually Use Before Submission

An SDTM-focused practical checklist for reviewing define.xml before submission, with emphasis on reproducibility, traceability, consistency, and the reviewer-facing problems that weak metadata creates.

If you work on SDTM submissions long enough, you learn that define.xml is never just a metadata file.

It is the reviewer’s map to the datasets, the controlled terminology, the derivations, the value-level rules, and the awkward corners of the study that never fully fit the standard.

Over time, I stopped treating validation as the only sign-off gate. I started using a review checklist that asks a harder question:

If I were a reviewer opening this package for the first time, would I understand the SDTM data without asking the sponsor what they meant?

A strong define.xml does two jobs at once. It tells the reviewer what is in the submission, and it tells them how to think about it. That is why I review it at three levels: package consistency, metadata accuracy, and reviewer usability. A file can be technically valid and still be weak in one of the other two.

What follows is the checklist I actually use before an SDTM submission goes out the door.

Figure 1. Validation clean is not the same as review-ready

A practical difference between technical conformance and reviewer understanding.

Start with the submission package

Before I even open the XML structure itself, I verify that the dataset package and metadata package agree on the basics.

Checklist

Confirm every submitted SDTM domain appears in define.xml.
Confirm define.xml does not list any domain that is not actually in the submission package.
Verify domain names, labels, classes, and structures match the submitted datasets.
Check file names, folder placement, and package conventions are consistent.
Check the SDRG, aCRF, datasets, and define.xml all point to the same final delivery.

Confirm standards and versions are explicitly identified

This sounds basic, but it is one of the easiest things to leave half-finished when metadata is updated late.

Checklist

Confirm the SDTMIG version is correctly identified.
Confirm the Define-XML version is the one intended for the submission.
Confirm controlled terminology versions are named consistently.
Confirm external dictionaries such as MedDRA, WHODrug, LOINC, or other study-level standards are versioned consistently across define.xml, SDRG, and study documentation.
Confirm no old standard version labels remain from template reuse.

I always treat version signaling as a reviewer orientation issue, not just a metadata housekeeping issue.

Check dataset metadata first

The fastest way to spot a weak define.xml is to compare dataset-level metadata against the actual XPT files.

Checklist

Dataset label matches the dataset’s real purpose and SDTM domain.
Class is correct and consistent with SDTM usage.
Structure is correct, including whether the domain is one record per subject, one record per event, or another expected pattern.
Keys and identifier variables are consistent with the domain content.
Dataset-level comments explain anything unusual the reviewer needs to know.

I never trust the metadata spec alone here. I compare the XPT header, define.xml dataset metadata, and the mapping spec line by line for high-risk domains like DM, EX, AE, LB, VS, DS, and SUPP--.

Then review variables one by one

This is where most quiet problems live.

Checklist

Variable name, label, type, length, and format match the actual dataset.
Variable order is sensible and consistent with the implementation.
Core SDTM variables are present where expected.
Required, expected, and permissible usage is justified by the domain.
Controlled terminology fields actually point to the right codelist, and the codelist reflects what is used in the data.

1. Can the variable be reproduced from define.xml alone?

This is the first true reviewer test I use.

If a reviewer or another programmer had only the SDTM dataset and define.xml, could they recreate the variable safely?

If the answer is no, the metadata is not complete enough.

What I check

Formula is explicitly stated.
Anchor variables are named.
Selection logic is written, not implied.
Units and conversions are visible.
The description is specific enough that another programmer could reproduce the result without opening an internal spec.

Weak

Derived from reference start date.

Better

Study day is calculated as Event Date minus DM.RFSTDTC plus 1 when Event Date is on or after DM.RFSTDTC; otherwise Event Date minus DM.RFSTDTC. Records with partial dates are not assigned study day.

2. Are boundary conditions clearly defined?

Most ambiguity comes from the edges, not the main rule.

What I check

Same-day records
Missing time
Partial dates
Multiple qualifying records
Pre-treatment versus post-treatment boundary

For SDTM Findings flags such as LBLOBXFL, reviewers usually ask the same things.

Is “prior” based on date or datetime?
Are same-day records eligible?
What if time is missing?
How is “last” selected?

SDTM XML example for LBLOBXFL

Listing 1. Reviewer-friendly method description

<ItemDef OID="IT.LB.LBLOBXFL" Name="LBLOBXFL" DataType="text" Length="1">
  <Description>
    <TranslatedText xml:lang="en">Last Observation Before Exposure Flag</TranslatedText>
  </Description>
  <Origin Type="Derived"/>
  <MethodRef MethodOID="MT.LB.LBLOBXFL"/>
</ItemDef>

<MethodDef OID="MT.LB.LBLOBXFL" Name="Last Observation Before Exposure Flag Derivation" Type="Computation">
  <Description>
    <TranslatedText xml:lang="en">
      LBLOBXFL is assigned as 'Y' to the chronologically latest non-missing
      result collected before first exposure. If only dates are available,
      collection date must be strictly earlier than DM.RFSTDTC. Records on
      the first-dose date are eligible only when both collection time and
      dosing time are available and the collection occurs before dosing.
      Records with missing time on the first-dose date are not eligible.
      If multiple qualifying records exist, the latest chronological record
      is selected.
    </TranslatedText>
  </Description>
</MethodDef>

Figure 2. Boundary cases that should be visible in define.xml

These are the places where reviewer interpretation usually starts to diverge from team intent.

3. Is partial date handling explicitly documented?

Partial date handling is one of the biggest sources of inconsistency across SDTM.

Many define.xml files simply say:

Partial dates were imputed.

That does not tell the reviewer enough.

What I check

Which patterns are imputed
What values are assigned
Where the imputation is used
Whether imputed values are stored in SDTM
Whether the logic is consistent across domains

Better

AE start dates in YYYY-MM format are imputed to the first day of the month for treatment-emergent classification only. Imputed values are not stored in SDTM and are not used for time-to-event analyses.

4. Is unit standardization clearly described?

For domains such as LB, VS, and EG, this matters more than many teams expect.

What I check

Whether results are converted
What source drives the conversion
Whether standardization happens before flag derivation
How character results are handled

Weak

Standard unit.

Better

Results for LBTESTCD = ALT are standardized to U/L using approved central lab conversion factors before derivation of LBNRIND. Character results reported as below quantification limit remain in LBSTRESC and do not populate LBSTRESN.

Value-level metadata example for lab standardization

Listing 2. Example VLM description pattern

<WhereClauseDef OID="WC.LB.ALT">
  <RangeCheck Comparator="EQ" SoftHard="Soft">
    <CheckValue>ALT</CheckValue>
    <ItemOID>IT.LB.LBTESTCD</ItemOID>
  </RangeCheck>
</WhereClauseDef>

<ItemDef OID="IT.LB.LBSTRESN" Name="LBSTRESN" DataType="float">
  <Description>
    <TranslatedText xml:lang="en">
      For LBTESTCD = ALT, LBSTRESN is standardized to U/L using approved
      central lab conversion factors before derivation of LBNRIND.
    </TranslatedText>
  </Description>
</ItemDef>

Controlled terminology needs reviewer logic

Controlled terminology problems are rarely dramatic, but they are exactly the kind of thing reviewers notice.

I do not stop at checking whether a variable points to a codelist. I also check whether the codelist actually explains the values used in the dataset.

Checklist

Every coded variable points to the correct codelist.
Every coded value in the dataset is represented in the linked codelist.
Extensible versus non-extensible behavior is handled correctly.
“Other” values are used appropriately and not as a catch-all for unresolved mapping.
Custom terms are clearly identified, justified, and used only when needed.
External terminology references are consistent with the study implementation.

Traceability must make sense

Define.xml is not only about naming things correctly. It is about helping a reviewer understand where data came from and how it was derived.

Checklist

Origin is correct for each variable, especially collected, derived, and assigned variables.
Derivation descriptions are clear, concise, and reproducible.
External references, comments, and derivation logic are understandable without reading an internal spec.
If something is nonstandard, define.xml and SDRG tell the same story.

This matters most when a variable is derived from multiple sources, when date imputation is involved, or when the domain includes sponsor-specific nuances.

Review computational methods as reusable objects

In define.xml, a derivation is not just a sentence. It is a metadata object. If the same logic appears in multiple places, the method references should make that obvious.

Checklist

Each MethodDef is actually referenced where intended.
Duplicated logic is reused rather than described differently in multiple places.
Method text is specific enough to reproduce the derivation.
Sponsor-defined methods are not described so broadly that they hide record-level conditions.
Method naming is understandable to a reviewer and not only to the study team.

5. Is origin and traceability unambiguous?

This is one of the biggest reviewer confidence checks.

What I check

Is the value CRF-collected, assigned, or derived?
Is sponsor mapping logic visible?
Does define.xml align with SDRG or cSDRG wording?

Weak

Relationship to study drug.

Better

Collected on AE CRF as investigator assessment of relationship to study treatment. In studies with multiple investigational products, SDTM value represents relationship to primary study treatment as defined in protocol. Sponsor mapping rules are applied when more than one relationship is recorded.

Figure 3. Traceability path I expect define.xml to support

This is the path a reviewer should be able to follow without guessing.

Value-level metadata deserves extra attention

Value-level metadata is often where strong define.xml packages become weak. It is especially important when a variable behaves differently by record type, when metadata changes by subset, or when special derivations need precise explanation.

Checklist

Value-level metadata is used only when needed and not as a workaround for poor dataset design.
The conditions for the value-level metadata are correctly specified.
The metadata actually covers all relevant records in the dataset.
The resulting description is understandable to a reviewer who is not part of the study team.
Each VLM entry adds something useful beyond the parent variable description.

Review SUPP-- carefully

SUPP-- is often technically valid and still a sign that something needs a second look. I always check whether supplemental qualifiers are truly the right implementation, or whether the metadata is compensating for a design decision that deserves more scrutiny.

Checklist

Each supplemental qualifier is appropriate for SUPP-- use.
QNAM, QLABEL, QVAL, IDVAR, and IDVARVAL align with the parent record.
Supplemental qualifiers are traceable back to the source collection.
Reviewer-facing comments explain any heavy reliance on SUPP--.
The same concept is not represented both in a parent domain and in SUPP-- without explanation.

6. Are value-level metadata entries actually useful?

I do not look at VLM just to see whether it exists. I look at whether it adds anything useful.

What I check

Does each VLM entry add context that the parent variable does not?
Are conditions clearly defined?
Are methods aligned across subsets?
Are units, flags, and derivations consistent with the condition?

If VLM is only repeating variable-level text, it is not doing enough.

7. Is the logic consistent across domains?

This is where quiet inconsistency shows up.

Typical pattern:

AE uses imputed dates
LB excludes partial dates
VS uses visit date
EG uses datetime boundary

Each rule may be valid. But together they may look inconsistent unless the metadata explains where the differences are intentional.

What I check

Same concept, same logic where possible
If not, differences are clearly documented

Hyperlinks and references must work

A broken link in define.xml feels small until it lands in a reviewer’s lap.

Checklist

All internal references resolve correctly.
All external links point to the intended file or metadata object.
Links to codelists, origin documents, and external references render correctly in the stylesheet output.
The stylesheet displays the metadata in a readable way for human review.

Review define.xml as a reviewer would actually read it

I always open the rendered define.xml in a browser and navigate it as if I were seeing the package for the first time. This catches problems that schema validation does not.

Checklist

Dataset pages load cleanly.
Variable pages are readable and not cluttered with broken references.
Value-level metadata is easy to follow in the rendered view.
Long method text wraps correctly and is still readable.
Codelists, comments, and document links open in a way that helps rather than slows review.

Confirm consistency with SDRG and aCRF

In practice, define.xml, SDRG, aCRF, and datasets should all tell the same story about the SDTM implementation.

Checklist

Dataset descriptions in define.xml match the SDRG narrative.
Deviations from SDTM IG or controlled terminology are explained the same way across documents.
aCRF annotations support the variables and origins described in define.xml.
Custom domains or special handling are described consistently across the package.

Pay extra attention to custom domains and sponsor-defined variables

Reviewers are usually more tolerant of nonstandard implementation than teams expect, as long as it is explained clearly and consistently. What creates friction is not the existence of a custom rule. It is weak explanation.

Checklist

Custom domains are clearly identified and justified.
Sponsor-defined variables do not look like standard variables by accident.
Naming, labels, origins, and methods are aligned across define.xml and SDRG.
Reviewer-facing explanations describe why the implementation was needed, not only what was done.

8. Does the metadata match the actual SDTM data?

This sounds obvious, but it fails more often than it should.

What I check

Derivation wording matches observed values.
Units match actual standardized data.
Flags behave the way the method says they do.
No leftover template language remains.

A common example is when metadata says “latest value prior to treatment,” but same-day post-dose records are still flagged. That is not a programming issue anymore. That is a metadata credibility issue.

Run validation, but do not stop there

Validation catches structural problems. It does not catch every reviewer-facing problem.

Checklist

XML validates against the intended Define-XML schema.
No broken references or unresolved metadata objects remain.
No obvious conformance errors remain after tool-based validation.
Manual review confirms the metadata still makes reviewer sense after the final dataset freeze.
The stylesheet output is readable and matches the intended submission package.

9. Are ambiguous phrases eliminated?

These phrases are common, but they usually create more uncertainty than they remove:

Derived from reference start date
Last non-missing value
Standard unit
Partial dates were imputed
Relationship to study drug

Each one hides decisions. The fix is not more words for the sake of more words. The fix is writing the actual rule.

Recheck after the final metadata refresh

One of the most common late-stage problems is not a wrong derivation. It is a right derivation described by the wrong final metadata because the datasets, SDRG, aCRF, and define.xml did not freeze in the same rhythm.

Checklist

Final XPT files match the last define.xml build.
No late updates were made to one document but not the others.
Reviewer comments, methods, and links still point to the final objects.
The rendered define.xml reflects the actual submission package, not the pre-freeze draft.

Figure 4. The checklist I use before sign-off

A simple pre-submission pass that catches most weak-metadata problems.

10. Final test: will this create a reviewer question?

This is the last question I ask.

Can a reviewer understand this rule without asking what we meant?

If the answer is no, the metadata is still too thin.

Final thought

The define.xml packages that cause the fewest review problems are usually not the ones with the fanciest tooling.

They are the ones where the metadata, datasets, SDRG, and annotated CRF tell the same story without forcing the reviewer to fill in the gaps.

That is the standard I use before submission.

Suggested closing question for comments

Which define.xml checkpoint catches the most problems in your SDTM submissions, reproducibility, traceability, standards/version control, or boundary handling?

STUDYSAS BLOG