Define.xml for SUPPQUAL — Getting QNAM-Level Metadata Right

If you have worked on SDTM submissions long enough, you know SUPPQUAL define.xml is where packages start to break down. Not because the data is wrong, but because the metadata does not fully explain what the data represents.

This is not a recap of SUPPQUAL structure. This is about how define.xml actually fails in submission and how to fix it before a reviewer points it out.

SUPPQUAL is not difficult because of structure. It is difficult because the meaning of the data exists only in define.xml.

Table of Contents

The SUPPQUAL ItemGroupDef — What the Spec Actually Requires
Value-Level Metadata — Why SUPPQUAL Demands It
Building QNAM-Level VLM Entries Correctly
WhereClauseDef Construction — Mechanics and Traps
Origin Tracing for SUPPQUAL Variables
Controlled Terminology in SUPPQUAL QVAL — Who Owns the Codelist?
Common Submission Rejection Patterns
PMDA-Specific Considerations
SAS Utility: Generating VLM Entries Programmatically
Pre-Submission Checklist
IDVAR / IDVARVAL — The Hidden Failure Point
SUPPQUAL vs Custom Domain — Design Decision
What Pinnacle 21 Will NOT Catch
Scaling Problems in Large SUPPQUAL Domains
Define.xml v2.0 vs v2.1 — What Changes for SUPPQUAL
Edge Cases You Will Hit
How Reviewers Actually Read SUPPQUAL
Cross-Domain Consistency — The Silent Check
Levels of Automation — Maturity Model
Bad vs Good — Full Picture

1. The SUPPQUAL ItemGroupDef — What the Spec Actually Requires

Start with the foundation. A SUPPQUAL dataset in CDISC SDTM is a special-purpose structure with a fixed set of variables: STUDYID, RDOMAIN, USUBJID, IDVAR, IDVARVAL, QNAM, QLABEL, and QVAL. Every SUPPQUAL dataset carries these same column names regardless of what domain it hangs off. That fixed structure is what makes define.xml hard — the column names tell you nothing about what any given row contains.

In define.xml (Define-XML v2.0 and v2.1), a SUPPQUAL domain is represented as an ItemGroupDef whose OID typically follows IG.SUPPXX convention. Inside that ItemGroupDef, you declare ItemRefs for the eight structural columns. That part is routine. The complexity begins with QNAM and QVAL.

The FDA Technical Conformance Guide (TCG), the CDISC Define-XML 2.0 specification, and the CDISC SDTM IG all converge on the same expectation: every distinct QNAM value that appears in the dataset must have a corresponding Value-Level Metadata entry in define.xml. Not a collective entry. Not a reference to the QNAM column generally. Each QNAM individually, with its own label, data type, origin, and — where applicable — codelist or controlled terminology reference.

This is the single most common gap in SUPPQUAL define packages. Programmers correctly document the column-level metadata for QNAM (type=text, origin=Predecessor, etc.) but never create the value-level entries that tell reviewers what QNAM="AESLIFE" or QNAM="LBMETHOD" actually means in context. FDA reviewers are specifically checking for VLM completeness in SUPP-- domains.

Here is the minimal correct ItemGroupDef structure for a SUPPAE domain:

<!-- ItemGroupDef for SUPPAE -->
<def:ItemGroupDef OID="IG.SUPPAE"
                  Name="SUPPAE"
                  Repeating="Yes"
                  IsReferenceData="No"
                  SASDatasetName="SUPPAE"
                  def:Structure="Supplemental Qualifiers for AE"
                  def:Purpose="Tabulation"
                  def:StandardOID="STD.SDTMIG.3.3"
                  def:ArchiveLocationID="LF.SUPPAE">
  <Description>
    <TranslatedText xml:lang="en">
      Supplemental Qualifiers for Adverse Events
    </TranslatedText>
  </Description>
  <ItemRef ItemOID="IT.SUPPAE.STUDYID"  OrderNumber="1"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.RDOMAIN"  OrderNumber="2"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.USUBJID"  OrderNumber="3"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.IDVAR"    OrderNumber="4"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.IDVARVAL" OrderNumber="5"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.QNAM"     OrderNumber="6"  Mandatory="Yes"
           def:KeySequence="1"/>
  <ItemRef ItemOID="IT.SUPPAE.QLABEL"   OrderNumber="7"  Mandatory="Yes"/>
  <ItemRef ItemOID="IT.SUPPAE.QVAL"     OrderNumber="8"  Mandatory="Yes"/>
  <def:leaf ID="LF.SUPPAE" xlink:href="suppae.xpt">
    <def:title>suppae.xpt</def:title>
  </def:leaf>
</def:ItemGroupDef>

Note def:KeySequence="1" on the QNAM ItemRef. This is required to signal to the define viewer that QNAM functions as the discriminator key within this row structure. Some older define packages omit it. Reviewers notice.

2. Value-Level Metadata — Why SUPPQUAL Demands It

Value-Level Metadata (VLM) in define.xml exists to document variables whose meaning is row-dependent. In most SDTM datasets, a column has one meaning. AETERM always means adverse event term. You document it once at the column level and you are done.

SUPPQUAL breaks this. QVAL can contain a free-text description in one row, a numeric value in another, a controlled term from a codelist in a third. The column-level metadata for QVAL — because it must accommodate everything — is necessarily generic. It cannot tell the reviewer whether QVAL for QNAM="AESLIFE" should be Y/N, whether QVAL for QNAM="LBMETHOD" maps to a codelist, or whether QVAL for QNAM="AOCCIFL" is a character flag with a specific set of permissible values.

VLM fixes this. It is the mechanism by which you attach row-specific metadata to a column. In define.xml v2.0/v2.1, VLM is implemented using def:ValueListDef elements referenced from a def:ValueListRef attribute on the QVAL ItemDef. Each entry inside the ValueListDef is a separate ItemRef, constrained by a WhereClauseRef that scopes it to a specific QNAM value.

SUPPQUAL Interpretation Flow:
QNAM → WhereClause → VLM ItemDef → Origin / Codelist → Reviewer Understanding

Every link in this chain must be explicit and correct. A break at any point means the reviewer cannot interpret the variable — and will write a query instead.

Think of it as a lookup table stitched into the define.xml structure itself. The reviewer opens the define viewer, clicks on QVAL in SUPPAE, and instead of seeing a single generic description, they see a structured list of every QNAM with its own label, type, origin, and optionally a codelist link.

3. Building QNAM-Level VLM Entries Correctly

The complete VLM implementation for SUPPQUAL requires four interconnected XML components working together. Get any one wrong and the define viewer renders garbage or the validator throws errors.

3.1 The ValueListDef Block

You declare one def:ValueListDef per SUPPQUAL dataset. Its OID is referenced from the QVAL ItemDef. Inside it, one ItemRef per distinct QNAM value in your dataset.

<def:ValueListDef OID="VL.SUPPAE.QVAL">

  <!-- Entry for QNAM = AESLIFE -->
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AESLIFE"
           OrderNumber="1"
           Mandatory="Yes">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AESLIFE"/>
  </ItemRef>

  <!-- Entry for QNAM = AECONTRT -->
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AECONTRT"
           OrderNumber="2"
           Mandatory="Yes">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AECONTRT"/>
  </ItemRef>

  <!-- Entry for QNAM = AOCCIFL -->
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AOCCIFL"
           OrderNumber="3"
           Mandatory="No">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AOCCIFL"/>
  </ItemRef>

  <!-- Entry for QNAM = AERELNST -->
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AERELNST"
           OrderNumber="4"
           Mandatory="No">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AERELNST"/>
  </ItemRef>

</def:ValueListDef>

The Mandatory attribute here reflects whether every subject/record that has a parent AE record must have this QNAM populated. This is a clinical judgment, not a programming one. Get input from your data manager.

3.2 The QVAL ItemDef with ValueListRef

The QVAL ItemDef at column level must carry a def:ValueListRef pointing to your ValueListDef. This is the hook that connects column metadata to value-level metadata.

<ItemDef OID="IT.SUPPAE.QVAL"
         Name="QVAL"
         DataType="text"
         Length="200"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Result Value for the Supplemental Qualifier
    </TranslatedText>
  </Description>
  <!-- This is the critical link to VLM -->
  <def:ValueListRef ValueListOID="VL.SUPPAE.QVAL"/>
</ItemDef>

The Length on the column-level QVAL ItemDef should match the actual XPT variable length. The VLM-level ItemDefs for each QNAM can specify shorter lengths that reflect the actual maximum length for that specific qualifier. FDA reviewers check for length consistency.

3.3 The VLM-Level ItemDefs

Each QNAM gets its own ItemDef. This is where the clinical meaning, data type, codelist reference, and origin go. This is the piece that most define packages either skip entirely or populate with placeholder text.

<!-- AESLIFE: Life Threatening -->
<ItemDef OID="IT.SUPPAE.QVAL.AESLIFE"
         Name="AESLIFE"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Indicator of whether the adverse event was life-threatening.
      Populated from the Life-Threatening field on the SAE page.
    </TranslatedText>
  </Description>
  <CodeListRef CodeListOID="CL.NY"/>
  <def:Origin Type="CRF">
    <def:DocumentRef leafID="LF.CRF">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="AE_SAE_PAGE"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

<!-- AECONTRT: Concomitant Treatment Given -->
<ItemDef OID="IT.SUPPAE.QVAL.AECONTRT"
         Name="AECONTRT"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Indicator of whether concomitant treatment was given for the AE.
    </TranslatedText>
  </Description>
  <CodeListRef CodeListOID="CL.NY"/>
  <def:Origin Type="CRF">
    <def:DocumentRef leafID="LF.CRF">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="AE_DETAILS_PAGE"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

<!-- AOCCIFL: Any Occurrence Indicator Flag -->
<ItemDef OID="IT.SUPPAE.QVAL.AOCCIFL"
         Name="AOCCIFL"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Flag indicating the first occurrence of an AE with the same
      preferred term. Derived based on chronological order within subject.
    </TranslatedText>
  </Description>
  <CodeListRef CodeListOID="CL.NY"/>
  <def:Origin Type="Derived">
    <def:DocumentRef leafID="LF.SUPPAE_SPECS">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="SUPPAE_DERIVATION"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

<!-- AERELNST: Relationship to Non-Study Treatment -->
<ItemDef OID="IT.SUPPAE.QVAL.AERELNST"
         Name="AERELNST"
         DataType="text"
         Length="50"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Relationship of the adverse event to a non-study treatment.
      Free text captured on the AE CRF.
    </TranslatedText>
  </Description>
  <!-- No CodeListRef — free text field -->
  <def:Origin Type="CRF">
    <def:DocumentRef leafID="LF.CRF">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="AE_RELATIONSHIP_PAGE"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

Several things to notice here. First, SASFieldName="QVAL" on every VLM-level ItemDef. This is correct — the actual XPT variable being described is QVAL regardless of which QNAM you are documenting. Second, the Length at VLM level reflects the actual maximum length for that qualifier's values, not the dataset-level QVAL length. Third, the CodeListRef is present only when the QNAM has controlled values. Free-text QNAMs get no CodeListRef. This distinction matters — a reviewer who sees a codelist reference for a free-text field will flag it.

4. WhereClauseDef Construction — Mechanics and Traps

The def:WhereClauseDef is what scopes each VLM entry to its QNAM. It defines the condition "this metadata applies when QNAM equals this value." Getting WhereClauseDef wrong is the second most common source of define validation errors in SUPPQUAL packages.

4.1 Standard WhereClauseDef Structure

<!-- WhereClause for QNAM = AESLIFE -->
<def:WhereClauseDef OID="WC.SUPPAE.QNAM.AESLIFE">
  <RangeCheck Comparator="EQ"
              SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>AESLIFE</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

<!-- WhereClause for QNAM = AECONTRT -->
<def:WhereClauseDef OID="WC.SUPPAE.QNAM.AECONTRT">
  <RangeCheck Comparator="EQ"
              SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>AECONTRT</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

<!-- WhereClause for QNAM = AOCCIFL -->
<def:WhereClauseDef OID="WC.SUPPAE.QNAM.AOCCIFL">
  <RangeCheck Comparator="EQ"
              SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>AOCCIFL</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

The def:ItemOID attribute on the RangeCheck must point to the ItemDef for QNAM within the same SUPPQUAL dataset — specifically IT.SUPPAE.QNAM in this example. Not a generic QNAM OID. Not a cross-domain reference. The OID must resolve to the QNAM column definition within this specific SUPPQUAL domain.

Common mistake: reusing WhereClauseDef OIDs across SUPPQUAL domains. If you build SUPPAE and SUPPLB and give both the same WC OIDs for shared QNAM names (like QNAM=FAST or QNAM=SPEC), validators will throw duplicate OID errors or silently cross-link the wrong ItemOID references. Every domain needs its own WhereClauseDef set with domain-scoped OIDs and domain-specific ItemOID references.

4.2 The SoftHard Attribute

Use SoftHard="Soft" for SUPPQUAL WhereClause entries. A Hard constraint implies the data should fail a range check if the condition is violated. In a SUPP context the WhereClause is not a validation rule — it is a scoping filter. Soft is correct. Some define generators default to Hard. Check your output.

4.3 Case Sensitivity in CheckValue

The value inside <CheckValue> must match the actual QNAM values in the XPT exactly, including case. SAS XPT is case-preserving for character values. If your dataset has QNAM="AESlife" in even one row and your WhereClauseDef has <CheckValue>AESLIFE</CheckValue>, the VLM entry will not resolve for those rows and Pinnacle 21 will flag the mismatch. Validate against the actual unique QNAM values in your dataset before finalizing define.xml.

4.4 Multi-Condition WhereClause (Rare but Real)

Occasionally a qualifier's meaning changes depending on the parent IDVAR. If QNAM="VISIT" behaves differently when IDVAR="AESEQ" versus IDVAR="MHSEQ" — which should not happen in well-designed SDTM but does happen in rescue mapping situations — you can build a multi-condition WhereClause:

<def:WhereClauseDef OID="WC.SUPPAE.QNAM.VISIT.AESEQ">
  <RangeCheck Comparator="EQ"
              SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>VISIT</CheckValue>
  </RangeCheck>
  <RangeCheck Comparator="EQ"
              SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.IDVAR">
    <CheckValue>AESEQ</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

Multiple RangeCheck elements inside one WhereClauseDef are evaluated as AND conditions by the Define-XML specification. Use this sparingly. If you find yourself doing this frequently, it is usually a sign that the SUPPQUAL design itself needs revisiting before worrying about the define.

5. Origin Tracing for SUPPQUAL Variables

Origin documentation for SUPPQUAL is where the real intellectual work lives. It is also where most packages cut corners. Regulatory reviewers — particularly FDA and PMDA — are increasingly using define.xml as an audit instrument, not just a reference document. The origin chain must be defensible.

5.1 The Define-XML Origin Types and What They Mean in SUPPQUAL Context

Origin Type	When to Use in SUPPQUAL	What Reviewers Expect
`CRF`	QVAL is directly transcribed from a CRF field	PDF page reference pointing to the exact CRF question. NamedDestination preferred over page numbers.
`Derived`	QVAL is computed from other data (flags, first-occurrence logic, duration calculations)	Reference to derivation specs or annotated CRF note explaining the logic. Reviewers want to see the method, not just "Derived."
`Assigned`	QVAL comes from a sponsor-assigned value not captured on a CRF (study day calculations, batch assignments)	Some reference to the assigning entity or protocol specification.
`Predecessor`	QVAL is carried over or transformed from a prior dataset or CDASH mapping — use with caution in SUPPQUAL	The predecessor source should be traceable. Generic "Predecessor" with no document reference is not acceptable in modern submissions.
`Protocol`	QVAL represents a protocol-defined classification not captured explicitly in the CRF	Reference to specific protocol section or amendment.

5.2 CRF Origin — Getting the PDFPageRef Right

The most common origin in SUPPQUAL is CRF. The structure requires a DocumentRef pointing to the annotated CRF leaf, with a PDFPageRef specifying where in the CRF the field appears.

<def:Origin Type="CRF">
  <def:DocumentRef leafID="LF.ACRF">
    <def:PDFPageRef
      Type="NamedDestination"
      PageRefs="AE_PAGE_SAE_CRITERIA"/>
  </def:DocumentRef>
</def:Origin>

The Type on PDFPageRef should be NamedDestination if your annotated CRF has named bookmark anchors, or PhysicalPage if you are using page numbers. Named destinations are more stable across CRF revisions. If your CRF authoring tool supports it, insist on named destinations. Physical page numbers shift whenever a CRF page is added or removed, and a define.xml built against page numbers becomes inaccurate with each CRF version increment.

The leafID must match the ID attribute of a def:leaf element declared elsewhere in your define.xml. That leaf must point to a document that actually exists in the submission package. Broken leaf references fail define validation. Cross-check the leaf IDs against your actual eSub folder structure before finalizing.

5.3 Derived Origin — Giving Reviewers Enough Information

Derived QNAMs are the hardest to document well. The spec says to include a DocumentRef, but many programmers point to a general specifications document rather than the specific derivation. This is a missed opportunity.

The minimum acceptable Derived origin documentation in 2024+ submissions includes: a reference document that describes the derivation logic, a page or named anchor within that document that shows the specific algorithm, and — where the derivation is non-trivial — a Description element on the ItemDef that explains the method in enough plain language that a reviewer can understand it without opening the spec document.

<ItemDef OID="IT.SUPPAE.QVAL.AOCCIFL"
         Name="AOCCIFL"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Any occurrence indicator flag. Set to 'Y' for the first chronological
      occurrence of an AE preferred term within a subject, based on AESTDTC
      ascending, then AESEQ ascending. All subsequent occurrences of the same
      preferred term for the same subject are left blank.
    </TranslatedText>
  </Description>
  <def:Origin Type="Derived">
    <def:DocumentRef leafID="LF.SDTM_SPECS">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="SUPPAE_AOCCIFL_DERIVATION"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

Notice that the Description element does the work. The DocumentRef points a reviewer to the formal specs. But the description alone is enough for a reviewer to validate the derivation without opening anything. That is the standard to aim for.

5.4 The Structural Variable Origin Problem

RDOMAIN, IDVAR, IDVARVAL, QLABEL — these structural SUPPQUAL variables trip up many define packages. Their origin is technically Assigned or Derived in most implementations, because they are constructed by the programmer rather than collected on a CRF. But they are not user-facing clinical data in the same way QVAL is. Many programmers leave them as Predecessor or assign them a generic "Assigned" without further documentation.

The FDA TCG expectation is that IDVAR and IDVARVAL have clear origin documentation that explains which variable in the parent domain they reference. A note in the Description element for IDVAR stating "Contains the name of the key variable in the parent AE domain used to link supplemental records. Populated with 'AESEQ'" is significantly better than a bare Assigned origin.

6. Controlled Terminology in SUPPQUAL QVAL — Who Owns the Codelist?

When a QNAM takes values from a controlled terminology, the VLM-level ItemDef for that QNAM should carry a CodeListRef. The question is: which codelist, and where is it defined?

6.1 CDISC Standard Codelists

Most Y/N flags in SUPPQUAL map to the CDISC NY codelist. Reference it exactly as you would from any other domain. The CodeListDef for NY goes in the global CodeLists section of your define.xml, not inside the SUPPQUAL section.

<!-- In the CodeLists section of define.xml -->
<CodeList OID="CL.NY"
          Name="No Yes Response"
          DataType="text"
          def:StandardOID="STD.CT.2024-09-27">
  <ExternalCodeList
    Dictionary="NCI"
    Version="2024-09-27"
    ref="C66742"/>
</CodeList>

<!-- In the VLM ItemDef -->
<ItemDef OID="IT.SUPPAE.QVAL.AESLIFE" ...>
  ...
  <CodeListRef CodeListOID="CL.NY"/>
</ItemDef>

6.2 Sponsor-Defined Codelists for SUPPQUAL

Some QNAMs have permissible values that are sponsor-defined — not from CDISC terminology. A common example is a dose escalation category or a protocol-specific severity classification that appears as a supplemental qualifier. For these, you define a local CodeList with a clear naming convention and mark it appropriately.

<CodeList OID="CL.SUPPAE.AESCATYP"
          Name="AE Categorization Type"
          DataType="text">
  <EnumeratedItem CodedValue="INFUSION RELATED"
                  def:ExtendedValue="No"/>
  <EnumeratedItem CodedValue="HYPERSENSITIVITY"
                  def:ExtendedValue="No"/>
  <EnumeratedItem CodedValue="CRS"
                  def:ExtendedValue="No"/>
</CodeList>

For sponsor-defined codelists, use def:ExtendedValue="No" on each EnumeratedItem to indicate these are the complete permissible values and not extensions of an external dictionary. If your codelist extends a CDISC codelist by adding sponsor-specific terms, use def:ExtendedValue="Yes" on the added items and reference the parent standard codelist via ExternalCodeList.

6.3 QNAMs That Should Not Have a Codelist

Free-text QNAMs — verbatim descriptions, reason fields, comment fields — must not carry a CodeListRef. This sounds obvious, but a common error occurs when a programmer builds a template from an existing QNAM that does have a codelist and forgets to strip the CodeListRef when adding a free-text QNAM. The result is a define.xml that claims a free-text field has controlled permissible values, which Pinnacle 21 will flag as a terminology inconsistency and which FDA reviewers will question.

7. Common Submission Rejection Patterns

These are patterns drawn from actual FDA and EMEA reviewer feedback letters and study data validation report (SDVR) findings. They cluster into five categories.

7.1 Missing VLM for One or More QNAMs

Rejection Pattern: "Value-level metadata not provided for QNAM values [list]. Each unique QNAM in SUPP-- datasets must have corresponding VLM entries in define.xml." This is the most frequent finding. It usually happens when a QNAM is added late in the study lifecycle — a new flag requested by biostatistics after the define package was already built — and the VLM entry is either forgotten or added only to the dataset without updating define.xml.

Prevention: Your define.xml build process should include a programmatic check that compares the unique QNAM values in each production SUPP XPT against the WhereClauseDef CheckValues in the corresponding ValueListDef. Any QNAM in the dataset with no matching WhereClause is a define gap. Run this check as part of your define QC program, not as a manual step.

/* SAS: Check for QNAMs missing from VLM */
/* Assumes you have parsed define.xml WhereClause values into
   a dataset called vlm_qnams with fields: domain, qnam_value */

proc sql;
  create table missing_vlm as
  select distinct a.rdomain,
                  a.qnam,
                  "No VLM entry in define.xml" as issue
  from suppae a
  left join vlm_qnams b
    on upcase(a.qnam) = upcase(b.qnam_value)
    and b.domain = 'SUPPAE'
  where b.qnam_value is null;
quit;

proc print data=missing_vlm noobs; run;

7.2 QNAM Values in Dataset Do Not Match CheckValue in WhereClauseDef

Rejection Pattern: "WhereClause condition for [DOMAIN].QNAM EQ [value] does not match observed QNAM values in dataset. Observed: [AEOSPTA], WhereClause: [AEOSTPA]." This is a typo problem, pure and simple. The QNAM name in the CheckValue element is not identical to the QNAM string in the XPT. Usually a transposition error or a case mismatch discovered after the fact.

Prevention: Never hand-type QNAM values into WhereClauseDef CheckValue elements. Generate them programmatically from the dataset itself, or at minimum diff your define.xml CheckValues against a proc freq output of the actual dataset QNAMs.

7.3 Origin Type "CRF" with No PDFPageRef or Broken LeafID

Rejection Pattern: "Origin Type=CRF specified for [QNAM] but no CRF page reference provided. Unable to locate source question in annotated CRF." The origin says CRF but either the DocumentRef is missing, the leafID does not resolve, or the PDFPageRef points to a named destination that does not exist in the annotated CRF PDF.

Prevention: Validate all leafIDs against actual files present in the submission package. Validate all NamedDestination values against the bookmarks/destinations actually present in the CRF PDF. Both are scriptable checks — PDF bookmark extraction via Python or a SAS DDE/shell call is straightforward and should be part of your define QC process.

7.4 Derived QNAMs with No Explanation of Derivation Logic

Rejection Pattern: "Origin Type=Derived specified for [QNAM] but derivation logic not documented in define.xml or referenced specifications. Please provide the algorithm used to derive this variable." The define says Derived, the DocumentRef points to a general specs document, but there is no specific derivation description anywhere accessible to the reviewer.

Prevention: For every Derived QNAM, the Description element on the VLM ItemDef should contain enough information that a reviewer can understand the derivation method without opening a separate document. The DocumentRef is supplementary, not a replacement for the in-line description.

7.5 Inconsistent Length Between Column-Level and VLM-Level ItemDef

Rejection Pattern: "VLM-level Length for QNAM=[value] exceeds column-level Length for QVAL. VLM Length should not exceed parent column length." The column-level QVAL ItemDef declares Length=200, but a specific VLM ItemDef for one QNAM declares Length=250. This is internally inconsistent — a VLM entry cannot describe data that is longer than the column that contains it.

Prevention: VLM-level lengths should always be less than or equal to the column-level length for QVAL. In practice, column-level QVAL length should match the XPT variable length, and VLM lengths should reflect the actual maximum observed length for each QNAM's values. Run proc contents and proc means maxdec=0 on QVAL grouped by QNAM to determine appropriate VLM lengths.

/* Get max QVAL length by QNAM for length documentation */
proc sql;
  create table qval_lengths as
  select rdomain,
         qnam,
         max(length(strip(qval))) as max_qval_length
  from suppae
  group by rdomain, qnam
  order by rdomain, qnam;
quit;

proc print data=qval_lengths noobs label;
  label max_qval_length = "Max QVAL Length";
run;

7.6 QLABEL in Define.xml Does Not Match Actual QLABEL Values in Dataset

Rejection Pattern: "Label documented in define.xml Description element for QNAM=[value] does not match QLABEL observed in dataset. Define: 'Life Threatening Event', Dataset: 'Life-Threatening'." QLABEL is a character variable in the SUPPQUAL dataset. Its value must be consistent across all rows for a given QNAM, and the Description on the corresponding VLM ItemDef must reflect this label accurately.

The QLABEL in your dataset is the authoritative source. Your define.xml Description element for each QNAM VLM entry should use the same phrasing. If QLABEL="Life-Threatening" in the dataset, the VLM Description should say "Life-Threatening" in its label description, not a longer or differently punctuated form. FDA reviewers do exact text comparisons between define.xml and dataset values in SDVR tooling.

8. PMDA-Specific Considerations

PMDA submissions add layers beyond FDA requirements. If you are delivering a Japan package, several SUPPQUAL define.xml behaviors require specific attention.

8.1 Bilingual Descriptions

PMDA increasingly expects Japanese-language TranslatedText elements alongside English descriptions, particularly for VLM ItemDefs in SUPPQUAL. This applies to the Description element. Using a single xml:lang="en" element is technically valid per the schema but draws reviewer queries in Japan submissions. The correct pattern:

<Description>
  <TranslatedText xml:lang="en">
    Indicator of whether the adverse event was life-threatening.
  </TranslatedText>
  <TranslatedText xml:lang="ja">
    有害事象が生命を脅かすものであったかどうかを示す指標。
  </TranslatedText>
</Description>

If you do not have translation resources, at minimum ensure the English description is precise enough that a Japanese reviewer using machine translation can derive accurate meaning. Ambiguous English descriptions compounded by imperfect machine translation is a known source of PMDA reviewer queries.

8.2 PMDA Requires QNAM-Level Variable Metadata in the Data Definition Document

PMDA validation checklists specifically call out that SUPPQUAL QNAM values should be documented as variables in the data definition document (essentially define.xml) with labels and derivation rules equivalent to how named variables are documented in non-SUPP domains. Their reviewers check this against the SDTM datasets using their own tooling.

8.3 Encoding and Character Width in QLABEL

PMDA submissions often involve Japanese character data in non-SUPP domains but QLABEL in SUPPQUAL is almost always ASCII English. However, if your submission includes Japanese QLABEL values, verify that the XPT character encoding documentation in define.xml (via the def:CommentDef mechanism or a dedicated annotation) explicitly acknowledges the encoding. PMDA has flagged submissions where the define.xml implied ASCII-only encoding but the dataset contained multi-byte characters.

9. SAS Utility: Generating VLM Entries Programmatically

Manually authoring VLM entries for large SUPPQUAL domains — SUPPCM with 20+ QNAMs, SUPPLB with method and specimen qualifiers — is error-prone and time-consuming. Build a generation utility that takes a metadata specs dataset as input and outputs the XML fragments for ValueListDef, WhereClauseDef, and ItemDef elements.

9.1 Input Metadata Dataset Structure

/* Define the metadata specs dataset for SUPPQUAL VLM generation */
/* One row per unique QNAM per SUPPQUAL domain */

data suppqual_vlm_specs;
  length domain    $8
         qnam      $8
         qlabel    $40
         datatype  $10
         length_    8
         origin    $20
         codelist  $40
         crf_dest  $80
         derivation_text $500;
  infile cards dsd;
  input domain $ qnam $ qlabel $ datatype $ length_
        origin $ codelist $ crf_dest $ derivation_text $;
cards;
SUPPAE,AESLIFE,Life-Threatening,text,1,CRF,CL.NY,AE_SAE_PAGE,.
SUPPAE,AECONTRT,Concomitant Treatment Given,text,1,CRF,CL.NY,AE_DETAILS_PAGE,.
SUPPAE,AOCCIFL,Any Occurrence Indicator Flag,text,1,Derived,CL.NY,.,Flag for first occurrence by PT within subject based on AESTDTC ascending
SUPPAE,AERELNST,Relationship to Non-Study Therapy,text,50,CRF,..,AE_RELATIONSHIP_PAGE,.
;
run;

9.2 XML Generation Macro

%macro gen_supp_vlm(domain=, specs_ds=, outfile=);

  /* Step 1: Get unique QNAMs ordered by sequence */
  proc sort data=&specs_ds.(where=(domain="&domain."))
            out=_specs;
    by domain qnam;
  run;

  filename vlm_out "&outfile.";
  data _null_;
    file vlm_out lrecl=32767;

    /* ValueListDef opening tag */
    put "<def:ValueListDef OID=""VL.&domain..QVAL"">";

    set _specs end=last;
    by domain;

    seq + 1;

    /* ItemRef within ValueListDef */
    put '  <ItemRef ItemOID="IT.' domain +(-1) '.QVAL.' qnam +(-1) '"';
    put '           OrderNumber="' seq +(-1) '"';
    put '           Mandatory="Yes">';
    put '    <def:WhereClauseRef WhereClauseOID="WC.' domain +(-1) '.QNAM.' qnam +(-1) '"/>';
    put '  </ItemRef>';

    if last then put "</def:ValueListDef>";
  run;

  /* Step 2: WhereClauseDefs */
  data _null_;
    file vlm_out lrecl=32767 mod;
    set _specs;

    put '<def:WhereClauseDef OID="WC.' domain +(-1) '.QNAM.' qnam +(-1) '">';
    put '  <RangeCheck Comparator="EQ" SoftHard="Soft"';
    put '              def:ItemOID="IT.' domain +(-1) '.QNAM">';
    put '    <CheckValue>' qnam +(-1) '</CheckValue>';
    put '  </RangeCheck>';
    put '</def:WhereClauseDef>';
  run;

  /* Step 3: VLM-level ItemDefs */
  data _null_;
    file vlm_out lrecl=32767 mod;
    set _specs;

    put '<ItemDef OID="IT.' domain +(-1) '.QVAL.' qnam +(-1) '"';
    put '         Name="' qnam +(-1) '"';
    put '         DataType="' datatype +(-1) '"';
    put '         Length="' length_ +(-1) '"';
    put '         SASFieldName="QVAL">';
    put '  <Description>';
    put '    <TranslatedText xml:lang="en">' qlabel +(-1) '</TranslatedText>';
    put '  </Description>';

    if codelist ne '.' then
      put '  <CodeListRef CodeListOID="' codelist +(-1) '"/>';

    if origin = 'CRF' then do;
      put '  <def:Origin Type="CRF">';
      put '    <def:DocumentRef leafID="LF.ACRF">';
      put '      <def:PDFPageRef Type="NamedDestination" PageRefs="'
          crf_dest +(-1) '"/>';
      put '    </def:DocumentRef>';
      put '  </def:Origin>';
    end;
    else if origin = 'Derived' then do;
      put '  <def:Origin Type="Derived">';
      put '    <def:DocumentRef leafID="LF.SDTM_SPECS"/>';
      put '  </def:Origin>';
    end;

    put '</ItemDef>';
  run;

  filename vlm_out clear;
%mend gen_supp_vlm;

/* Usage */
%gen_supp_vlm(
  domain   = SUPPAE,
  specs_ds = suppqual_vlm_specs,
  outfile  = /path/to/suppae_vlm_fragments.xml
);

This is a skeleton macro. In production, extend it to handle: multi-part derivation text with proper XML escaping, structured DocumentRef with PDFPageRef per QNAM rather than a generic leaf, and character escaping for XML special characters in description text (&, <, >). Use tranwrd() chains or a dedicated XML-escape function before writing to file.

9.3 Validating the Output

After generating XML fragments and integrating them into your define.xml, validate using at minimum two tools: Pinnacle 21 Community Edition (or Enterprise if your organization has it) and the CDISC Define-XML validator at define.cdisc.org. These tools catch different classes of errors. Pinnacle 21 focuses on clinical data consistency; the CDISC validator focuses on schema conformance. Run both before any submission.

10. Pre-Submission Checklist for SUPPQUAL Define.xml

Before any define package leaves your desk for submission, verify each of the following. This is not a generic checklist — every item here maps directly to a rejection pattern seen in actual submissions.

#	Check	How to Verify
1	Every unique QNAM in the XPT has a corresponding WhereClauseDef with matching CheckValue (exact case, exact string)	Programmatic diff of proc freq(QNAM) vs CheckValue elements in define.xml
2	Every QNAM WhereClauseRef resolves to a defined WhereClauseDef OID	XML validation; Pinnacle 21 will flag unresolved OIDs
3	VLM QVAL ItemDef length ≤ column-level QVAL ItemDef length ≤ XPT QVAL variable length	Compare proc contents length output against define lengths
4	All CRF-origin QNAM entries have PDFPageRef with a NamedDestination that exists in the annotated CRF PDF	Extract PDF named destinations programmatically and cross-check
5	All Derived QNAM entries have a Description that explains the derivation in plain language	Manual review of each Derived VLM ItemDef Description element
6	No CodeListRef on free-text QNAM entries	Review all VLM ItemDefs; confirm CodeListRef absent for free-text QNAMs
7	Description text for each QNAM matches the QLABEL value used in the dataset	Compare proc freq QLABEL output against VLM Description TranslatedText values
8	All leaf IDs referenced in DocumentRef elements resolve to actual files in the submission package	Cross-reference all leafID attributes against submission folder contents
9	SoftHard="Soft" on all SUPPQUAL WhereClause RangeCheck elements	grep or XPath search for SoftHard in define.xml
10	WhereClauseDef OIDs are domain-scoped (no shared OIDs across SUPPQUAL domains)	Confirm OID naming convention includes domain prefix; check for duplicate OIDs in full define.xml
11	QVAL column-level ItemDef has def:ValueListRef attribute pointing to the correct ValueListDef OID	Direct inspection of QVAL ItemDef element in define.xml XML source
12	def:KeySequence="1" on the QNAM ItemRef within the SUPPQUAL ItemGroupDef	Direct inspection of ItemGroupDef structure in define.xml XML source
13	Cross-domain consistency checks implemented (SUPP vs parent domain) — e.g., AESLIFE=Y where AESER=N	Custom SAS QC program — not covered by Pinnacle 21

Run Pinnacle 21 after completing every item on this list, not before. Pinnacle 21 is a final gate, not a substitute for structured pre-review. It catches some things this list misses and misses some things this list catches. Both layers are necessary.

11. IDVAR / IDVARVAL — The Hidden Failure Point

Most define.xml discussions focus on QNAM and QVAL. Reviewers do not. They struggle just as much with linkage. SUPPQUAL is only interpretable if the reviewer can answer one question: which parent record does this qualifier belong to? That answer depends entirely on RDOMAIN, IDVAR, and IDVARVAL working together — and all three need explicit metadata support in define.xml.

11.1 What IDVAR Actually Represents

IDVAR is not just a variable name string. It defines the linking key into the parent domain. When RDOMAIN=AE, IDVAR=AESEQ, and IDVARVAL=12, the SUPPAE record links to the AE record where AESEQ=12 for the same USUBJID. That linkage must be unambiguous. If a reviewer cannot confirm uniqueness of the parent record — because AESEQ is not unique, or because IDVAR points to a variable that admits duplicates — the entire SUPPQUAL interpretation collapses.

11.2 Why Ambiguity Here Breaks Review

Consider a package where IDVAR=AESEQ and IDVARVAL=12 appears in a SUPPAE record. On the surface this is correct. But if the reviewer looks at the AE domain and finds three records with AESEQ=12 — a well-known error pattern in datasets with improper sequence numbering — they cannot resolve which parent record the qualifier belongs to. No validator catches this. The SUPPQUAL structure is internally consistent. The linkage is semantically broken.

If a reviewer cannot trace a SUPPQUAL row to exactly one parent record in under ten seconds, your define.xml is incomplete. The metadata should pre-empt the question, not leave the reviewer reverse-engineering your data.

Real Review Failure: "Multiple AE records share AESEQ=12 for subject 101-001. Unable to determine which AE record the SUPPAE qualifier applies to. Please confirm whether AESEQ is unique within USUBJID and provide corrected linkage documentation."

11.3 What Define.xml Should Make Clear

The Description elements for IDVAR and IDVARVAL are almost always boilerplate or blank in real submissions. They should not be. At minimum:

<!-- IDVAR ItemDef description -->
<Description>
  <TranslatedText xml:lang="en">
    Identifies the key variable in the parent AE domain used to link
    supplemental qualifier records. Populated with AESEQ. The combination
    of USUBJID and IDVARVAL uniquely identifies one AE record.
  </TranslatedText>
</Description>

<!-- IDVARVAL ItemDef description -->
<Description>
  <TranslatedText xml:lang="en">
    Value of AESEQ identifying the parent AE record for this qualifier.
    1:1 correspondence with AE.AESEQ; no parent record has more than one
    SUPPAE record per QNAM.
  </TranslatedText>
</Description>

The phrase "uniquely identifies one AE record" is doing real work here. It tells the reviewer the mapping is deterministic and they do not need to investigate further. That is what good metadata does — it removes the reviewer's doubt before the doubt forms.

11.4 When Linkage Is Non-Standard

Non-standard linkage arises in rescue mapping, merged domain scenarios, and certain oncology or endpoint-heavy programs. If IDVARVAL is derived from a concatenation, a hash, or a composite of multiple parent variables, you must say so explicitly — both in the IDVARVAL Description element and, if the derivation is complex, in a referenced specifications document.

<!-- Non-standard linkage: composite key -->
<Description>
  <TranslatedText xml:lang="en">
    IDVARVAL derived from concatenation of AESEQ (zero-padded to 4 digits)
    and VISITNUM to ensure uniqueness across repeated events at the same
    visit. Format: AESEQ_VISITNUM (e.g., "0012_3"). This composite key
    resolves to exactly one AE record per USUBJID.
  </TranslatedText>
</Description>

If you cannot write a clear derivation description for IDVARVAL, that is a signal the linkage design itself needs to be revisited before the define.xml is written.

Quick Validation Check

Before documenting IDVARVAL linkage as 1:1, verify it programmatically. This check should run as part of your standard define QC program — not as a one-off manual step.

/* Check uniqueness of parent linkage key within USUBJID */
/* Run against the parent domain BEFORE writing IDVAR metadata */
proc sql;
  select usubjid,
         aeseq,
         count(*) as n
  from ae
  group by usubjid, aeseq
  having calculated n > 1;
quit;

/* If any rows return: your IDVARVAL linkage is non-unique.
   Do NOT document as 1:1 in define.xml until this is resolved.
   Investigate whether AESEQ was reset across visits or periods. */

12. SUPPQUAL vs Custom Domain — Design Decision

Every section so far assumes the decision to use SUPPQUAL has already been made. That assumption is dangerous. Using SUPPQUAL for data that belongs in a named domain — or in a custom domain — creates bloated define.xml, unreadable VLM, and reviewer confusion that define.xml cannot fix after the fact. The metadata problem is downstream of a design problem.

12.1 The Decision Framework

Situation	Better Choice	Why
Qualifier repeats across records in a structured way (multiple specimens, multiple methods)	New SDTM domain or RELREC-linked structure	Repeated structure in SUPPQUAL creates parallel rows with no obvious grouping anchor
Variable is critical to analysis or is referenced in TFLs	Named variable in the parent domain	Analysis variables buried in SUPPQUAL require merging before use; reviewers expect key variables accessible directly
Genuinely one-off qualifier, not repeated, not critical to analysis	SUPPQUAL	This is the use case SUPPQUAL was designed for
Qualifier is sponsor-defined and used only for internal tracking	SUPPQUAL with clear Assigned origin	Acceptable if clearly documented; do not conflate with analysis variables

12.2 Warning Signs You Chose Wrong

If any of the following are true for your SUPPQUAL domain, the design decision deserves a second look before you invest time in VLM construction:

QNAM count above 25 to 30 in a single SUPPQUAL dataset. The same concept split across multiple QNAMs with a numeric suffix (FLAG1, FLAG2, FLAG3). Heavy derivation logic inside SUPPQUAL that would be simpler to express as a named derived variable. Reviewers routinely needing to merge SUPPQUAL back to the parent domain to understand the parent domain records.

When SUPPQUAL is acting like a domain, it should be a domain. The define.xml burden of a 40-QNAM SUPPLB is an order of magnitude higher than a clean 10-variable LB extension domain, and the reviewer experience is worse in every dimension.

13. What Pinnacle 21 Will NOT Catch

Pinnacle 21 validates structure. Reviewers validate meaning. These are not the same activity, and conflating them is one of the most expensive mistakes a define.xml team can make.

13.1 The Gap Between Structural Validity and Review Readiness

A define.xml can pass every Pinnacle 21 rule and still be useless to a reviewer. Here is the category of failures P21 cannot detect:

Failure Type	P21 Response	Reviewer Response
Vague Description element ("Flag" as the entire description of AOCCIFL)	✅ Pass — Description element is present	❌ Query — "Please provide the derivation algorithm for this flag"
CRF origin with a valid leafID that points to the wrong CRF page	✅ Pass — leafID resolves, PDFPageRef is syntactically valid	❌ Query — "CRF page referenced does not contain the field described"
QNAM name that is cryptic or non-intuitive (e.g., QNAM=XCFL3)	✅ Pass — QNAM ≤ 8 characters, conforms to naming rules	❌ Query — "Please clarify the meaning of XCFL3 and confirm CDISC naming convention compliance"
Derived origin with no derivation description	✅ Pass — Origin Type=Derived is valid	❌ Query — "Derivation method not documented in define.xml or referenced specifications"
QVAL values inconsistent with parent domain variables (AESLIFE=Y where AESER=N)	✅ Pass — No cross-domain logic checks in P21 at this level	❌ Query — "Logical inconsistency between SUPPAE.AESLIFE and AE.AESER for subject [ID]"

13.2 Practical Cross-Domain Checks

These are checks Pinnacle 21 does not perform but reviewers routinely validate manually. Automating them before submission eliminates a major class of queries.

SUPPQUAL Signal	Parent Domain Check	Failure Pattern
AESLIFE = 'Y'	AE.AESER should be 'Y'	Life-threatening event marked non-serious
AECONTRT = 'Y'	AE.AEREL = 'NOT RELATED'	Treatment given but event marked unrelated
AOCCIFL = 'Y'	Check earlier AESTDTC for same PT	Incorrect first-occurrence flag
SUPPLB.LBMETHOD present	LB.LBSPEC consistent	Method/specimen mismatch
SUPPAE.VISIT	SV / TV alignment	Visit naming inconsistencies

Example SAS Check: SUPPAE vs AE Consistency

/* Cross-domain QC: flag AESLIFE=Y where AE.AESER ne Y */
/* Run this before define.xml finalization — P21 will not catch it */
proc sql;
  create table ae_mismatch as
  select a.usubjid,
         a.idvarval    as aeseq,
         a.qval        as aeslife,
         b.aeser
  from suppae a
  left join ae b
    on  a.usubjid = b.usubjid
    and input(a.idvarval, best.) = b.aeseq
  where a.qnam   = 'AESLIFE'
    and a.qval   = 'Y'
    and b.aeser ne 'Y';
quit;

/* If ae_mismatch has rows: data error or documentation gap.
   Resolve before submission. Document exceptions in the
   Data Reviewer's Guide if clinically justified. */

13.3 The Practical Rule

Passing Pinnacle 21 means your data is structurally acceptable for submission intake. It does not mean your metadata is review-ready. The two gates are sequential, not equivalent. Run P21 to clear the first gate. Then review your define.xml as if you are an FDA data reviewer who has never seen this study and has 20 minutes to understand what SUPPAE contains. If you would have questions, so will the reviewer.

14. Scaling Problems in Large SUPPQUAL Domains

In real studies, SUPPLB with 40-plus QNAMs and SUPPCM with 60-plus QNAMs are not unusual. At that scale, define.xml stops being a reference document and starts being a navigation problem. This is a design failure that manifests as a metadata failure, and VLM construction alone cannot solve it.

14.1 What Goes Wrong at Scale

A ValueListDef block with 50 ItemRef entries renders slowly in define viewers and is cognitively unnavigable. QNAM names become abbreviated to the point of opacity. VLM entries that repeat the same boilerplate description across dozens of Y/N flags become indistinguishable to a reviewer scanning for meaning. The define.xml becomes accurate but useless.

The specific failure pattern in SUPPLB is worth naming. Laboratory method, specimen type, and result units are often implemented as separate QNAMs per test code — SPECBLOOD, SPECURINE, METHCHEM, METHHEMA — rather than as a single QNAM with controlled terminology. This explodes the QNAM count for no semantic gain and makes the VLM block look like noise.

14.2 What Experienced Teams Do

Before building VLM for a large SUPPQUAL domain, conduct a QNAM audit. Group all QNAMs by semantic category. If multiple QNAMs represent the same concept with different scope (specimen type by test, method by test), evaluate whether a single QNAM with controlled values covers all cases. The goal is the smallest QNAM count that captures all necessary information without ambiguity.

Keep QNAM names human-readable within the 8-character constraint. SPECTYP is better than SPCTX3. METHCD is better than MTHCD2. Reviewers read QNAM values directly in the dataset — they should not need to reference define.xml to understand what category of information a QNAM represents.

Hard truth: if your SUPPQUAL domain requires scrolling to navigate in a define viewer, it is already too complex. The define.xml is a symptom. The design is the problem.

14.3 Re-Evaluating Domain Design When Scale Grows

A SUPPCM with 60 QNAMs representing concomitant medication classification attributes is not a well-designed SUPPQUAL. It is an unstated custom domain. If the study team cannot justify why these qualifiers could not be represented as named variables in CM or a CM extension domain, that is the conversation to have before the submission package is built — not after define.xml review comes back with 30 metadata queries.

If your SUPPQUAL cannot be understood without scrolling, filtering, or cross-referencing multiple sections of define.xml, it is already too complex for efficient review. At that point you are not solving a metadata problem. You are managing the consequences of a design decision that should have been made differently.

15. Define.xml v2.0 vs v2.1 — What Changes for SUPPQUAL

Most teams treat v2.0 and v2.1 as interchangeable for SUPPQUAL work. They are not, and the differences cluster precisely around the VLM and WhereClause features that are central to SUPPQUAL metadata.

15.1 WhereClause Handling

In Define-XML v2.1, WhereClause handling is more formally specified, particularly for multi-condition expressions. The def:WhereClauseDef element in v2.1 supports cleaner namespacing and has better-defined behavior for AND semantics across multiple RangeCheck elements. If you are building complex multi-condition WhereClause entries — scoping VLM by both QNAM and IDVAR simultaneously — v2.1 is more predictable in how validators and viewers interpret the expression.

15.2 ExternalCodeList and Controlled Terminology Linkage

v2.1 introduces cleaner linkage mechanisms for external controlled terminology dictionaries, including support for NCI Thesaurus version pinning in the StandardOID attribute chain. For SUPPQUAL QNAMs that reference MedDRA, SNOMED, or LOINC values — which appears in specialized domains like SUPPDS or SUPPFA — v2.1 provides more precise and validator-checkable external codelist references.

15.3 Reviewer Tooling Compatibility

Modern FDA review tooling (JMP Clinical, the Agency's internal viewers, and the CDISC Define-XML viewer) handle v2.1 correctly. Legacy sponsor define viewers may not render v2.1 features correctly, particularly the improved WhereClause rendering. Validate your output in both the CDISC Define-XML viewer and Pinnacle 21 regardless of which version you target. If your submission standards allow v2.1, use it — but confirm with your regulatory affairs team that the target agency accepts v2.1 for the specific submission type.

Feature	v2.0 Behavior	v2.1 Behavior
Multi-condition WhereClause	Supported but ambiguously specified	Formally specified AND semantics
ExternalCodeList	Basic dictionary reference	Version-pinned, StandardOID-linked
ValueListDef scoping	Functional but verbose	Same structure, better validator support
FDA tooling acceptance	Fully accepted	Accepted; preferred for new submissions
PMDA tooling acceptance	Fully accepted	Accepted; confirm per submission type

16. Edge Cases You Will Hit

These are not hypotheticals. Every experienced SDTM programmer encounters all of them eventually. Knowing the right define.xml response in advance saves a revision cycle.

Case 1: Same QNAM Name, Different Meaning Across Studies

Not a problem within one dataset — it is a problem when you reuse a define.xml template across studies without auditing QNAM semantics. QNAM=FAST might mean "fasting status confirmed" in one study and "hours of fasting prior to sample" in another. The former is Y/N with a NY codelist. The latter is numeric stored as text with no codelist. If you port the VLM entry without reviewing the clinical meaning in the new study context, you will have a define.xml that says the wrong thing about your data.

Case 2: Numeric QVAL Stored as Text

This is extremely common. QVAL is always character in the XPT — SAS character variable, always. But some QNAMs store what is functionally a numeric value: a score, a count, a duration in hours. In define.xml, the DataType for these VLM entries should still be text to match the XPT variable type, but the Description must explicitly state that the content is numeric and document the units and expected range. Without that note, a reviewer seeing QVAL="72" has no frame of reference.

<ItemDef OID="IT.SUPPAE.QVAL.AEDURH"
         Name="AEDURH"
         DataType="text"
         Length="5"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Duration of adverse event in hours. Numeric value stored as character.
      Derived from (AEENDTC - AESTDTC) in hours, rounded to nearest integer.
      Range: 0 to 8760 (one year). Units: hours.
    </TranslatedText>
  </Description>
  <def:Origin Type="Derived"/>
</ItemDef>

Case 3: Blank vs Missing QVAL

A blank QVAL and a missing QVAL mean different things clinically. Blank can mean the question was asked and the answer was empty or not applicable. Missing in SDTM terms typically means the record should not exist. In SUPPQUAL, a row with a blank QVAL should almost never exist — if there is nothing to report for a qualifier, the row should not be created. If your dataset contains rows with blank QVAL, your VLM Description should explicitly state whether blank is a permissible value and what it signifies. Otherwise reviewers will query every blank QVAL as a potential data quality issue.

Case 4: Multiple QNAMs Representing One Concept

This arises from CRF mapping where multiple checkboxes each become their own QNAM. A concomitant medication reason-for-use form with 10 checkboxes should not produce 10 QNAMs (REASCARD, REASDIAB, REASHYP…). It should produce one QNAM (REAS) with controlled terminology values, or at most two QNAMs if the reason structure is genuinely multi-level. Each additional QNAM multiplies your VLM burden and dilutes the reviewer's ability to understand the data conceptually. Consolidation before SDTM mapping is far easier than VLM cleanup after the fact.

17. How Reviewers Actually Read SUPPQUAL

Reviewers do not read SUPPQUAL sequentially. They follow a chain: QNAM → WhereClause → VLM entry → Origin → parent domain. If any step in that chain is unclear, the result is a query.

Understanding that path is the single most useful frame for designing SUPPQUAL define.xml well. Design for it and the metadata almost writes itself.

17.1 The Reviewer Flow

A reviewer opens the SUPPQUAL dataset. They see a QNAM value — say, AESLIFE. They have one of two reactions: they know what it means immediately, or they go to define.xml. If they go to define.xml, they look at the VLM entry for AESLIFE in sequence: the Description, the Origin, the CodelistRef if present, and the PDFPageRef if the origin is CRF. They reconstruct the meaning of the variable from those four elements.

If any element is missing or vague, they stop reconstructing and start writing a query. The threshold is roughly five to ten seconds. If the meaning is not clear in that window, it becomes a formal question.

17.2 What This Means for Your Metadata

Description first. It is the primary artifact. Everything else is supporting documentation. A Description that fully explains the variable — its clinical meaning, its derivation if Derived, its permissible values if not codelist-controlled — means the reviewer may never need to open the CRF or the specs document. That is the standard to target.

Origin second. A clear, specific origin with a resolvable document reference tells the reviewer they could verify the source if they chose to. The fact that they could verify it is often enough that they do not need to.

Codelist third. If a codelist is present, the reviewer expects the data to conform to it exactly. Do not reference a codelist for a variable with values that are not in the codelist. That is worse than having no codelist reference.

Design for the reviewer who is reading your SUPPQUAL at 4 PM on a Friday after reviewing six other domains. They are not going to ask clarifying questions mentally. They are going to write queries. Every piece of define.xml that removes a potential query is time saved on both sides of the submission.

18. Cross-Domain Consistency — The Silent Check

Reviewers do not look at SUPPQUAL in isolation. They compare it against the parent domain systematically, and they compare SUPPQUAL-derived flags against analysis datasets. Logical inconsistencies between SUPPQUAL and parent domain variables are one of the most common sources of late-stage submission queries — because they require investigation to determine whether the inconsistency reflects a data error, a derivation error, or a documentation error.

18.1 Practical Cross-Domain Checks

SUPPQUAL Signal	Parent Domain Check	Failure Pattern
AESLIFE = 'Y'	AE.AESER should be 'Y'	Life-threatening event marked non-serious
AECONTRT = 'Y'	AE.AEREL = 'NOT RELATED'	Treatment given but event marked unrelated
AOCCIFL = 'Y'	Check earlier AESTDTC for same PT	Incorrect first-occurrence flag
SUPPLB.LBMETHOD present	LB.LBSPEC consistent	Method-specimen mismatch
VISIT in SUPP	SV / TV alignment	Visit naming inconsistencies

18.2 Reviewer Query Patterns

SUPPQUAL QNAM / Value	Parent Domain Variable / Expectation	Query Pattern
SUPPAE.AESLIFE = Y	AE.AESER should = Y (life-threatening implies serious)	"Subject [ID] has AESLIFE=Y in SUPPAE but AESER=N in AE. Please reconcile."
SUPPAE.AECONTRT = Y	A CM record for the AE treatment period should exist	"Concomitant treatment flagged in SUPPAE but no corresponding CM record found for subject [ID]."
SUPPLB.FAST = Y	LB.LBTPT or LB.LBTPTNUM should reflect fasting timepoint	"Fasting flag present in SUPPLB but LBTPT does not indicate fasting condition."
SUPPDS.DSSREAS (discontinuation reason)	DS.DSDECOD controlled term should align	"SUPPDS freetext reason inconsistent with DS.DSDECOD for subject [ID]."

18.2 What Define.xml Can and Cannot Do Here

Define.xml cannot prevent cross-domain logical inconsistencies — those are data quality issues. But define.xml can make the reviewer's investigation faster and less adversarial. If your SUPPAE VLM entry for AESLIFE includes a Description note stating "Expected to be consistent with AE.AESER; exceptions documented in the data reviewer's guide," you have pre-answered the question. The reviewer knows you thought about the relationship. That changes the tone of the review interaction significantly.

Add cross-domain consistency notes to VLM Descriptions for any qualifier that has a logical dependency on a parent domain variable. It adds two sentences to your metadata and removes a potential two-week query-response cycle.

19. Levels of Automation — Maturity Model

The SAS generation utility in Section 9 represents one point on a spectrum. Where a team sits on this spectrum determines how many define.xml errors they make per submission, how long define QC takes, and how much rework they absorb when datasets change late in the submission timeline.

Level	Description	Error Rate	Rework Cost When Data Changes
1 — Manual XML editing	Define.xml authored or edited by hand in a text editor or define tool UI	High — typos, OID mismatches, missed QNAMs	Very high — every QNAM change requires manual XML edits
2 — Metadata-driven generation	Define.xml generated from a metadata specs dataset; programmers edit the specs, not the XML	Medium — errors in specs propagate consistently; easier to catch and fix	Medium — update specs dataset, regenerate
3 — Dataset-derived VLM auto-build	VLM entries generated directly from the production SUPPQUAL XPT; QNAM list and lengths derived programmatically	Low — structural metadata always reflects actual data	Low — rerun the generation program against updated XPT
4 — Full validation and reconciliation framework	Automated comparison of define.xml against datasets plus cross-domain consistency checks; discrepancies flagged in a QC report	Very low — errors caught before submission regardless of late data changes	Very low — reconciliation runs catch drift automatically

Level 3 is the practical target for any team running more than two to three submissions per year. Level 4 is achievable with investment in the reconciliation tooling and is worth building if your team operates an FSP model across multiple sponsors. The metadata drift problem — where define.xml and datasets diverge after a late protocol amendment — is the most common cause of last-minute submission delays, and it is entirely preventable with Level 3 or 4 automation.

20. Bad vs Good — Full Picture

Everything in the preceding sections collapses into this single comparison. This is the difference between a define.xml that passes intake and one that survives review.

❌ The Incomplete Package

<!-- Column-level QVAL — no ValueListRef -->
<ItemDef OID="IT.SUPPAE.QVAL"
         Name="QVAL"
         DataType="text"
         Length="200"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">Result Value</TranslatedText>
  </Description>
  <!-- Missing: def:ValueListRef -->
  <def:Origin Type="Predecessor"/>
</ItemDef>

<!-- No ValueListDef block -->
<!-- No WhereClauseDef entries -->
<!-- No VLM-level ItemDefs -->

What a reviewer sees: QVAL with no VLM. Every QNAM in the dataset is undocumented at the value level. The reviewer must interpret AESLIFE, AECONTRT, AOCCIFL, and every other qualifier without metadata support. Origin=Predecessor on a structural column with no predecessor documentation. Queries incoming.

✅ The Complete Package

<!-- Column-level QVAL with ValueListRef -->
<ItemDef OID="IT.SUPPAE.QVAL"
         Name="QVAL"
         DataType="text"
         Length="200"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Result value for the supplemental qualifier identified by QNAM.
      See value-level metadata for QNAM-specific types, codelists, and origins.
    </TranslatedText>
  </Description>
  <def:ValueListRef ValueListOID="VL.SUPPAE.QVAL"/>
</ItemDef>

<!-- ValueListDef: one entry per QNAM -->
<def:ValueListDef OID="VL.SUPPAE.QVAL">
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AESLIFE"
           OrderNumber="1" Mandatory="Yes">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AESLIFE"/>
  </ItemRef>
  <ItemRef ItemOID="IT.SUPPAE.QVAL.AOCCIFL"
           OrderNumber="2" Mandatory="No">
    <def:WhereClauseRef WhereClauseOID="WC.SUPPAE.QNAM.AOCCIFL"/>
  </ItemRef>
</def:ValueListDef>

<!-- WhereClauseDefs -->
<def:WhereClauseDef OID="WC.SUPPAE.QNAM.AESLIFE">
  <RangeCheck Comparator="EQ" SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>AESLIFE</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

<def:WhereClauseDef OID="WC.SUPPAE.QNAM.AOCCIFL">
  <RangeCheck Comparator="EQ" SoftHard="Soft"
              def:ItemOID="IT.SUPPAE.QNAM">
    <CheckValue>AOCCIFL</CheckValue>
  </RangeCheck>
</def:WhereClauseDef>

<!-- VLM-level ItemDefs: AESLIFE -->
<ItemDef OID="IT.SUPPAE.QVAL.AESLIFE"
         Name="AESLIFE"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Indicator of whether the adverse event was life-threatening at the
      time of occurrence. Expected to be consistent with AE.AESER=Y;
      exceptions documented in the Data Reviewer's Guide Section 4.
    </TranslatedText>
  </Description>
  <CodeListRef CodeListOID="CL.NY"/>
  <def:Origin Type="CRF">
    <def:DocumentRef leafID="LF.ACRF">
      <def:PDFPageRef Type="NamedDestination" PageRefs="AE_SAE_PAGE"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

<!-- VLM-level ItemDefs: AOCCIFL -->
<ItemDef OID="IT.SUPPAE.QVAL.AOCCIFL"
         Name="AOCCIFL"
         DataType="text"
         Length="1"
         SASFieldName="QVAL">
  <Description>
    <TranslatedText xml:lang="en">
      Flag indicating the first chronological occurrence of an AE preferred
      term within a subject. Set to 'Y' for the first record by AESTDTC
      ascending, then AESEQ ascending. Subsequent occurrences left blank.
      Derived — no CRF source.
    </TranslatedText>
  </Description>
  <CodeListRef CodeListOID="CL.NY"/>
  <def:Origin Type="Derived">
    <def:DocumentRef leafID="LF.SDTM_SPECS">
      <def:PDFPageRef Type="NamedDestination"
                      PageRefs="SUPPAE_AOCCIFL_ALGO"/>
    </def:DocumentRef>
  </def:Origin>
</ItemDef>

The difference is not volume. The complete version has more XML, but every element is doing specific work. The incomplete version has almost no XML and none of it is useful. A reviewer reading the complete version can understand AESLIFE and AOCCIFL in under ten seconds each. A reviewer reading the incomplete version has to write queries to get that same information. That is the operational definition of a well-built SUPPQUAL define package versus a broken one.

Quick Reference Mental Model

Every SUPPQUAL VLM entry follows this chain. If any link is broken or missing, the reviewer cannot interpret the variable.

  QNAM value in dataset
       │
       ▼
  WhereClauseDef  ──── scopes the VLM entry to this QNAM
       │
       ▼
  VLM ItemDef (IT.SUPPXX.QVAL.QNAMNAME)
       ├── Description    ◄── clinical meaning + derivation note
       ├── DataType/Length◄── actual data characteristics
       ├── CodeListRef    ◄── controlled values (if applicable)
       └── Origin         ◄── CRF page / Derived method / Assigned source

Build this chain for every QNAM in every SUPPQUAL domain, make each link explicit, and the define.xml review writes itself.

Final Thought

SUPPQUAL define.xml complexity is a direct reflection of the architectural tradeoff CDISC made when they designed the supplemental qualifier model. A fixed eight-column structure that can carry arbitrary qualifiers is elegant for dataset design. For metadata, it is a nightmare — because the metadata burden that you would normally spread across named variables all collapses into one column (QVAL) and one discriminator (QNAM), and define.xml has to reconstruct that variable-level meaning through VLM machinery that most programmers interact with infrequently enough to make mistakes every time.

The way to get good at this is to build it wrong once, get the rejection letter, understand exactly which element was missing or malformed, and then build the QC scaffolding that prevents that specific error from ever occurring again. The checklist above is the aggregate of those failures across multiple programs. It saves you from learning each one the hard way.

SUPPQUAL is not documentation of data. It is reconstruction of variables. If define.xml does not make those variables obvious, the reviewer will stop and ask you to explain them.

Tags: define.xml SUPPQUAL value-level metadata VLM SDTM FDA submission PMDA WhereClauseDef QNAM IDVAR reviewer alignment Pinnacle 21 automation cross-domain regulatory SAS