Five Define.xml Phrases That Sound Fine, But Trigger Review Questions
Five Define.xml Phrases That Sound Fine, But Trigger Review Questions
A practical look at the wording patterns that pass internal review, validate cleanly, and still create trouble when a reviewer tries to understand your SDTM logic from metadata alone.
Some define.xml wording looks perfectly acceptable during internal review.
Then the same wording creates questions during submission review.
Not because the data is wrong. Not because the programming is broken. But because the description leaves too much room for interpretation.
That gap matters more than many teams realize. Define.xml is the reviewer’s first structured view of your SDTM package. If the metadata is thin, the reviewer starts guessing. And once guessing starts, questions follow.
A good define.xml description should let another programmer, or a reviewer, understand the rule without relying on team memory.
1. “Derived from reference start date”
This is common for --DY variables.
It sounds reasonable. But it does not tell the reviewer enough to recreate the rule safely.
What is missing:
- The actual formula
- The day 1 boundary rule
- How pre-treatment values are handled
- What happens with partial dates
- Whether the logic is date-based or datetime-based
Derived from reference start date.
Study day is calculated as Event Date minus DM.RFSTDTC plus 1 when Event Date is on or after DM.RFSTDTC; otherwise Event Date minus DM.RFSTDTC. No date imputation is applied for this derivation. Records with partial event dates are not assigned study day.
The second version does real work. It tells the reviewer what the formula is, where the boundary sits, and what is excluded.
2. “Last non-missing value prior to treatment”
This is one of the most common metadata phrases in findings logic, especially when teams derive an SDTM flag such as LBLOBXFL.
The wording sounds precise. It is not.
It leaves open several questions:
- What defines “prior”, reference start date or actual exposure datetime?
- What happens for records on the first-dose date?
- What if collection time is missing?
- Are unscheduled visits included?
- If more than one record qualifies, how is “last” decided?
The second version gives the reviewer something operational. It defines the anchor, the same-day rule, the missing-time rule, and the tie-break rule.
3. “Partial dates were imputed”
This is a classic phrase that carries almost no review value by itself.
It does not answer:
- Which date patterns were imputed
- What values were assigned
- Where the imputation was used
- Whether SDTM values were changed or left as collected
- Whether the rule applies across domains or only in one place
Partial dates were imputed.
Partial AE start dates are imputed for treatment-emergent classification only. Dates in YYYY-MM format are imputed to the first day of the month. Dates in YYYY format are imputed to January 1. Original collected values remain in AESTDTC. Imputed dates are not stored in SDTM and are not used for survival or time-to-event analyses.
The key point is not just the method. It is the scope.
4. “Standard unit”
This shows up in lab metadata all the time, especially when teams rely on short value-level notes.
The phrase does not tell the reviewer:
- Whether results were converted
- How vendor-specific factors were handled
- Whether standardization happened before or after flag derivation
- What happened to character results such as below quantification limit
Standard unit.
Results for LBTESTCD = ALT are standardized to U/L in LBSTRESU. When LBORRESU differs from U/L, conversion uses approved central lab conversion factors before derivation of LBNRIND. Character results reported as below quantification limit remain in LBSTRESC and do not populate LBSTRESN.
This kind of wording is much more useful to experienced programmers because it states order of operations and data-type behavior.
5. “Relationship to study drug”
This looks harmless, but it often hides one of the hardest traceability questions in SDTM: was the value collected, assigned, or sponsor-derived?
That question gets sharper in studies with more than one treatment or more than one possible relationship target.
Relationship to study drug.
Collected on the AE CRF as investigator assessment of relationship to study treatment. In studies with multiple investigational products, the SDTM value represents relationship to the primary investigational product defined in the protocol. When multiple products are recorded, sponsor mapping follows the hierarchy specified in the study data handling conventions.
The stronger version makes the origin and sponsor logic visible. That is what makes the metadata useful.
What these phrases have in common
None of these phrases are always wrong.
The problem is that they are too short for the job they are trying to do.
They work as internal reminders for a team that already knows the logic. They do not work well as reviewer-facing metadata.
That is the shift worth making in define.xml work. Stop writing labels that point to logic. Start writing metadata that explains the logic.
A practical test before sign-off
Before a define.xml package goes out, ask this:
If the answer is no, the description is probably too thin.
Final thought
Most review friction in define.xml does not come from dramatic mistakes.
It comes from ordinary wording that feels good enough until someone outside the team tries to rely on it.
That is why these five phrases matter. They look small. But they are often the exact place where trust in the metadata starts to weaken.
Which define.xml phrase do you see most often in submissions that sounds acceptable internally, but creates questions in review?