A Define.xml Review Checklist I Actually Use Before Submission
A Define.xml Review Checklist I Actually Use Before Submission
An SDTM-focused practical checklist for reviewing define.xml before submission, with emphasis on reproducibility, traceability, consistency, and the reviewer-facing problems that weak metadata creates.
If you work on SDTM submissions long enough, you learn that define.xml is never just a metadata file.
It is the reviewer’s map to the datasets, the controlled terminology, the derivations, the value-level rules, and the awkward corners of the study that never fully fit the standard.
Over time, I stopped treating validation as the only sign-off gate. I started using a review checklist that asks a harder question:
A strong define.xml does two jobs at once. It tells the reviewer what is in the submission, and it tells them how to think about it. That is why I review it at three levels: package consistency, metadata accuracy, and reviewer usability. A file can be technically valid and still be weak in one of the other two.
What follows is the checklist I actually use before an SDTM submission goes out the door.
Start with the submission package
Before I even open the XML structure itself, I verify that the dataset package and metadata package agree on the basics.
Checklist
- Confirm every submitted SDTM domain appears in define.xml.
- Confirm define.xml does not list any domain that is not actually in the submission package.
- Verify domain names, labels, classes, and structures match the submitted datasets.
- Check file names, folder placement, and package conventions are consistent.
- Check the SDRG, aCRF, datasets, and define.xml all point to the same final delivery.
Confirm standards and versions are explicitly identified
This sounds basic, but it is one of the easiest things to leave half-finished when metadata is updated late.
Checklist
- Confirm the SDTMIG version is correctly identified.
- Confirm the Define-XML version is the one intended for the submission.
- Confirm controlled terminology versions are named consistently.
- Confirm external dictionaries such as MedDRA, WHODrug, LOINC, or other study-level standards are versioned consistently across define.xml, SDRG, and study documentation.
- Confirm no old standard version labels remain from template reuse.
I always treat version signaling as a reviewer orientation issue, not just a metadata housekeeping issue.
Check dataset metadata first
The fastest way to spot a weak define.xml is to compare dataset-level metadata against the actual XPT files.
Checklist
- Dataset label matches the dataset’s real purpose and SDTM domain.
- Class is correct and consistent with SDTM usage.
- Structure is correct, including whether the domain is one record per subject, one record per event, or another expected pattern.
- Keys and identifier variables are consistent with the domain content.
- Dataset-level comments explain anything unusual the reviewer needs to know.
I never trust the metadata spec alone here. I compare the XPT header, define.xml dataset metadata, and the mapping spec line by line for high-risk domains like DM, EX, AE, LB, VS, DS, and SUPP--.
Then review variables one by one
This is where most quiet problems live.
Checklist
- Variable name, label, type, length, and format match the actual dataset.
- Variable order is sensible and consistent with the implementation.
- Core SDTM variables are present where expected.
- Required, expected, and permissible usage is justified by the domain.
- Controlled terminology fields actually point to the right codelist, and the codelist reflects what is used in the data.
1. Can the variable be reproduced from define.xml alone?
This is the first true reviewer test I use.
If a reviewer or another programmer had only the SDTM dataset and define.xml, could they recreate the variable safely?
If the answer is no, the metadata is not complete enough.
What I check
- Formula is explicitly stated.
- Anchor variables are named.
- Selection logic is written, not implied.
- Units and conversions are visible.
- The description is specific enough that another programmer could reproduce the result without opening an internal spec.
Derived from reference start date.
Study day is calculated as Event Date minus DM.RFSTDTC plus 1 when Event Date is on or after DM.RFSTDTC; otherwise Event Date minus DM.RFSTDTC. Records with partial dates are not assigned study day.
2. Are boundary conditions clearly defined?
Most ambiguity comes from the edges, not the main rule.
What I check
- Same-day records
- Missing time
- Partial dates
- Multiple qualifying records
- Pre-treatment versus post-treatment boundary
For SDTM Findings flags such as LBLOBXFL, reviewers usually ask the same things.
- Is “prior” based on date or datetime?
- Are same-day records eligible?
- What if time is missing?
- How is “last” selected?
3. Is partial date handling explicitly documented?
Partial date handling is one of the biggest sources of inconsistency across SDTM.
Many define.xml files simply say:
That does not tell the reviewer enough.
What I check
- Which patterns are imputed
- What values are assigned
- Where the imputation is used
- Whether imputed values are stored in SDTM
- Whether the logic is consistent across domains
AE start dates in YYYY-MM format are imputed to the first day of the month for treatment-emergent classification only. Imputed values are not stored in SDTM and are not used for time-to-event analyses.
4. Is unit standardization clearly described?
For domains such as LB, VS, and EG, this matters more than many teams expect.
What I check
- Whether results are converted
- What source drives the conversion
- Whether standardization happens before flag derivation
- How character results are handled
Standard unit.
Results for LBTESTCD = ALT are standardized to U/L using approved central lab conversion factors before derivation of LBNRIND. Character results reported as below quantification limit remain in LBSTRESC and do not populate LBSTRESN.
Controlled terminology needs reviewer logic
Controlled terminology problems are rarely dramatic, but they are exactly the kind of thing reviewers notice.
I do not stop at checking whether a variable points to a codelist. I also check whether the codelist actually explains the values used in the dataset.
Checklist
- Every coded variable points to the correct codelist.
- Every coded value in the dataset is represented in the linked codelist.
- Extensible versus non-extensible behavior is handled correctly.
- “Other” values are used appropriately and not as a catch-all for unresolved mapping.
- Custom terms are clearly identified, justified, and used only when needed.
- External terminology references are consistent with the study implementation.
Traceability must make sense
Define.xml is not only about naming things correctly. It is about helping a reviewer understand where data came from and how it was derived.
Checklist
- Origin is correct for each variable, especially collected, derived, and assigned variables.
- Derivation descriptions are clear, concise, and reproducible.
- External references, comments, and derivation logic are understandable without reading an internal spec.
- If something is nonstandard, define.xml and SDRG tell the same story.
This matters most when a variable is derived from multiple sources, when date imputation is involved, or when the domain includes sponsor-specific nuances.
Review computational methods as reusable objects
In define.xml, a derivation is not just a sentence. It is a metadata object. If the same logic appears in multiple places, the method references should make that obvious.
Checklist
- Each MethodDef is actually referenced where intended.
- Duplicated logic is reused rather than described differently in multiple places.
- Method text is specific enough to reproduce the derivation.
- Sponsor-defined methods are not described so broadly that they hide record-level conditions.
- Method naming is understandable to a reviewer and not only to the study team.
5. Is origin and traceability unambiguous?
This is one of the biggest reviewer confidence checks.
What I check
- Is the value CRF-collected, assigned, or derived?
- Is sponsor mapping logic visible?
- Does define.xml align with SDRG or cSDRG wording?
Relationship to study drug.
Collected on AE CRF as investigator assessment of relationship to study treatment. In studies with multiple investigational products, SDTM value represents relationship to primary study treatment as defined in protocol. Sponsor mapping rules are applied when more than one relationship is recorded.
Value-level metadata deserves extra attention
Value-level metadata is often where strong define.xml packages become weak. It is especially important when a variable behaves differently by record type, when metadata changes by subset, or when special derivations need precise explanation.
Checklist
- Value-level metadata is used only when needed and not as a workaround for poor dataset design.
- The conditions for the value-level metadata are correctly specified.
- The metadata actually covers all relevant records in the dataset.
- The resulting description is understandable to a reviewer who is not part of the study team.
- Each VLM entry adds something useful beyond the parent variable description.
Review SUPP-- carefully
SUPP-- is often technically valid and still a sign that something needs a second look. I always check whether supplemental qualifiers are truly the right implementation, or whether the metadata is compensating for a design decision that deserves more scrutiny.
Checklist
- Each supplemental qualifier is appropriate for SUPP-- use.
- QNAM, QLABEL, QVAL, IDVAR, and IDVARVAL align with the parent record.
- Supplemental qualifiers are traceable back to the source collection.
- Reviewer-facing comments explain any heavy reliance on SUPP--.
- The same concept is not represented both in a parent domain and in SUPP-- without explanation.
6. Are value-level metadata entries actually useful?
I do not look at VLM just to see whether it exists. I look at whether it adds anything useful.
What I check
- Does each VLM entry add context that the parent variable does not?
- Are conditions clearly defined?
- Are methods aligned across subsets?
- Are units, flags, and derivations consistent with the condition?
If VLM is only repeating variable-level text, it is not doing enough.
7. Is the logic consistent across domains?
This is where quiet inconsistency shows up.
Typical pattern:
- AE uses imputed dates
- LB excludes partial dates
- VS uses visit date
- EG uses datetime boundary
Each rule may be valid. But together they may look inconsistent unless the metadata explains where the differences are intentional.
What I check
- Same concept, same logic where possible
- If not, differences are clearly documented
Hyperlinks and references must work
A broken link in define.xml feels small until it lands in a reviewer’s lap.
Checklist
- All internal references resolve correctly.
- All external links point to the intended file or metadata object.
- Links to codelists, origin documents, and external references render correctly in the stylesheet output.
- The stylesheet displays the metadata in a readable way for human review.
Review define.xml as a reviewer would actually read it
I always open the rendered define.xml in a browser and navigate it as if I were seeing the package for the first time. This catches problems that schema validation does not.
Checklist
- Dataset pages load cleanly.
- Variable pages are readable and not cluttered with broken references.
- Value-level metadata is easy to follow in the rendered view.
- Long method text wraps correctly and is still readable.
- Codelists, comments, and document links open in a way that helps rather than slows review.
Confirm consistency with SDRG and aCRF
In practice, define.xml, SDRG, aCRF, and datasets should all tell the same story about the SDTM implementation.
Checklist
- Dataset descriptions in define.xml match the SDRG narrative.
- Deviations from SDTM IG or controlled terminology are explained the same way across documents.
- aCRF annotations support the variables and origins described in define.xml.
- Custom domains or special handling are described consistently across the package.
Pay extra attention to custom domains and sponsor-defined variables
Reviewers are usually more tolerant of nonstandard implementation than teams expect, as long as it is explained clearly and consistently. What creates friction is not the existence of a custom rule. It is weak explanation.
Checklist
- Custom domains are clearly identified and justified.
- Sponsor-defined variables do not look like standard variables by accident.
- Naming, labels, origins, and methods are aligned across define.xml and SDRG.
- Reviewer-facing explanations describe why the implementation was needed, not only what was done.
8. Does the metadata match the actual SDTM data?
This sounds obvious, but it fails more often than it should.
What I check
- Derivation wording matches observed values.
- Units match actual standardized data.
- Flags behave the way the method says they do.
- No leftover template language remains.
A common example is when metadata says “latest value prior to treatment,” but same-day post-dose records are still flagged. That is not a programming issue anymore. That is a metadata credibility issue.
Run validation, but do not stop there
Validation catches structural problems. It does not catch every reviewer-facing problem.
Checklist
- XML validates against the intended Define-XML schema.
- No broken references or unresolved metadata objects remain.
- No obvious conformance errors remain after tool-based validation.
- Manual review confirms the metadata still makes reviewer sense after the final dataset freeze.
- The stylesheet output is readable and matches the intended submission package.
9. Are ambiguous phrases eliminated?
These phrases are common, but they usually create more uncertainty than they remove:
- Derived from reference start date
- Last non-missing value
- Standard unit
- Partial dates were imputed
- Relationship to study drug
Each one hides decisions. The fix is not more words for the sake of more words. The fix is writing the actual rule.
Recheck after the final metadata refresh
One of the most common late-stage problems is not a wrong derivation. It is a right derivation described by the wrong final metadata because the datasets, SDRG, aCRF, and define.xml did not freeze in the same rhythm.
Checklist
- Final XPT files match the last define.xml build.
- No late updates were made to one document but not the others.
- Reviewer comments, methods, and links still point to the final objects.
- The rendered define.xml reflects the actual submission package, not the pre-freeze draft.
10. Final test: will this create a reviewer question?
This is the last question I ask.
If the answer is no, the metadata is still too thin.
Final thought
The define.xml packages that cause the fewest review problems are usually not the ones with the fanciest tooling.
They are the ones where the metadata, datasets, SDRG, and annotated CRF tell the same story without forcing the reviewer to fill in the gaps.
That is the standard I use before submission.
Which define.xml checkpoint catches the most problems in your SDTM submissions, reproducibility, traceability, standards/version control, or boundary handling?