Five Define.xml Phrases That Sound Fine, But Trigger Review Questions

StudySAS Blog

Five Define.xml Phrases That Sound Fine, But Trigger Review Questions

A practical look at the wording patterns that pass internal review, validate cleanly, and still create trouble when a reviewer tries to understand your SDTM logic from metadata alone.

Some define.xml wording looks perfectly acceptable during internal review.

Then the same wording creates questions during submission review.

Not because the data is wrong. Not because the programming is broken. But because the description leaves too much room for interpretation.

That gap matters more than many teams realize. Define.xml is the reviewer’s first structured view of your SDTM package. If the metadata is thin, the reviewer starts guessing. And once guessing starts, questions follow.

One useful standard

A good define.xml description should let another programmer, or a reviewer, understand the rule without relying on team memory.

Figure 1. Why these phrases fail in review

The problem is rarely the variable itself. The problem is the space between what the team knows and what the metadata actually says.

1. “Derived from reference start date”

This is common for --DY variables.

It sounds reasonable. But it does not tell the reviewer enough to recreate the rule safely.

What is missing:

The actual formula
The day 1 boundary rule
How pre-treatment values are handled
What happens with partial dates
Whether the logic is date-based or datetime-based

Weak

Derived from reference start date.

Better

Study day is calculated as Event Date minus DM.RFSTDTC plus 1 when Event Date is on or after DM.RFSTDTC; otherwise Event Date minus DM.RFSTDTC. No date imputation is applied for this derivation. Records with partial event dates are not assigned study day.

The second version does real work. It tells the reviewer what the formula is, where the boundary sits, and what is excluded.

2. “Last non-missing value prior to treatment”

This is one of the most common metadata phrases in findings logic, especially when teams derive an SDTM flag such as LBLOBXFL.

The wording sounds precise. It is not.

It leaves open several questions:

What defines “prior”, reference start date or actual exposure datetime?
What happens for records on the first-dose date?
What if collection time is missing?
Are unscheduled visits included?
If more than one record qualifies, how is “last” decided?

SDTM XML example, weak vs better

Listing 1. Minimal method description for LBLOBXFL

<ItemDef OID="IT.LB.LBLOBXFL" Name="LBLOBXFL" DataType="text" Length="1">
  <Description>
    <TranslatedText xml:lang="en">Last Observation Before Exposure Flag</TranslatedText>
  </Description>
  <Origin Type="Derived"/>
  <MethodRef MethodOID="MT.LB.LBLOBXFL"/>
</ItemDef>

<MethodDef OID="MT.LB.LBLOBXFL" Name="Last Observation Before Exposure Flag" Type="Computation">
  <Description>
    <TranslatedText xml:lang="en">
      Last non-missing result prior to treatment.
    </TranslatedText>
  </Description>
</MethodDef>

Listing 2. Reviewer-friendly method description for LBLOBXFL

<ItemDef OID="IT.LB.LBLOBXFL" Name="LBLOBXFL" DataType="text" Length="1">
  <Description>
    <TranslatedText xml:lang="en">Last Observation Before Exposure Flag</TranslatedText>
  </Description>
  <Origin Type="Derived"/>
  <MethodRef MethodOID="MT.LB.LBLOBXFL"/>
</ItemDef>

<MethodDef OID="MT.LB.LBLOBXFL" Name="Last Observation Before Exposure Flag Derivation" Type="Computation">
  <Description>
    <TranslatedText xml:lang="en">
      LBLOBXFL is assigned as 'Y' to the chronologically latest non-missing
      result collected before first exposure. If only dates are available,
      collection date must be strictly earlier than DM.RFSTDTC. Records on
      the first-dose date are eligible only when both collection time and
      dosing time are available and the collection occurs before dosing.
      Records with missing time on the first-dose date are not eligible.
      If multiple qualifying records exist, the latest chronological record
      is selected.
    </TranslatedText>
  </Description>
</MethodDef>

The second version gives the reviewer something operational. It defines the anchor, the same-day rule, the missing-time rule, and the tie-break rule.

3. “Partial dates were imputed”

This is a classic phrase that carries almost no review value by itself.

It does not answer:

Which date patterns were imputed
What values were assigned
Where the imputation was used
Whether SDTM values were changed or left as collected
Whether the rule applies across domains or only in one place

Weak

Partial dates were imputed.

Better

Partial AE start dates are imputed for treatment-emergent classification only. Dates in YYYY-MM format are imputed to the first day of the month. Dates in YYYY format are imputed to January 1. Original collected values remain in AESTDTC. Imputed dates are not stored in SDTM and are not used for survival or time-to-event analyses.

The key point is not just the method. It is the scope.

4. “Standard unit”

This shows up in lab metadata all the time, especially when teams rely on short value-level notes.

The phrase does not tell the reviewer:

Whether results were converted
How vendor-specific factors were handled
Whether standardization happened before or after flag derivation
What happened to character results such as below quantification limit

Weak

Standard unit.

Better

Results for LBTESTCD = ALT are standardized to U/L in LBSTRESU. When LBORRESU differs from U/L, conversion uses approved central lab conversion factors before derivation of LBNRIND. Character results reported as below quantification limit remain in LBSTRESC and do not populate LBSTRESN.

This kind of wording is much more useful to experienced programmers because it states order of operations and data-type behavior.

5. “Relationship to study drug”

This looks harmless, but it often hides one of the hardest traceability questions in SDTM: was the value collected, assigned, or sponsor-derived?

That question gets sharper in studies with more than one treatment or more than one possible relationship target.

Weak

Relationship to study drug.

Better

Collected on the AE CRF as investigator assessment of relationship to study treatment. In studies with multiple investigational products, the SDTM value represents relationship to the primary investigational product defined in the protocol. When multiple products are recorded, sponsor mapping follows the hierarchy specified in the study data handling conventions.

The stronger version makes the origin and sponsor logic visible. That is what makes the metadata useful.

Figure 2. The hidden pattern behind all five phrases

The wording changes from short labels to actual rules.

What these phrases have in common

None of these phrases are always wrong.

The problem is that they are too short for the job they are trying to do.

They work as internal reminders for a team that already knows the logic. They do not work well as reviewer-facing metadata.

That is the shift worth making in define.xml work. Stop writing labels that point to logic. Start writing metadata that explains the logic.

A practical test before sign-off

Before a define.xml package goes out, ask this:

Can another programmer, or a reviewer, understand this rule without asking what we meant?

If the answer is no, the description is probably too thin.

Final thought

Most review friction in define.xml does not come from dramatic mistakes.

It comes from ordinary wording that feels good enough until someone outside the team tries to rely on it.

That is why these five phrases matter. They look small. But they are often the exact place where trust in the metadata starts to weaken.

Suggested closing question for comments

Which define.xml phrase do you see most often in submissions that sounds acceptable internally, but creates questions in review?

STUDYSAS BLOG