Dynamic SUPPQUAL Generation Using Metadata-Driven SAS Macros

If you have managed SDTM deliverables across multiple studies at the same time, you already know what happens to SUPPQUAL programs. You start with one clean macro per domain. Then a protocol amendment adds three new supplemental variables to AE. A PMDA query forces a second QNAM into EX. The DMC needs something non-standard from LB. Six months later you have domain-specific programs with no shared logic, no central control, and every change requires manual updates across multiple files.

The metadata-driven approach fixes that. One control dataset. One macro family. All SUPP domains generated in a single call. This post walks through the full design, control dataset structure, macro architecture, the edge cases that cause real production issues, and the validation checks you should run before submission.

The Problem with Hard-Coded SUPPQUAL Programs

Hard-coded programs break in predictable ways. A QNAM gets added mid-study, someone has to find the right program, understand its structure, add the variable, and QC the whole thing again. An amendment renames a CRF field, IDVARVAL logic breaks silently because the source variable changed. A programmer unfamiliar with the original design adds a duplicate QNAM to a different SUPP domain by mistake. None of these are dramatic failures. They are the slow buildup of technical debt across a submission package.

The deeper issue is that SUPPQUAL generation is structurally the same across parent domains. For every QNAM, you are doing the same thing, pulling a variable from a source dataset, converting it to character, attaching the right RDOMAIN, linking it back to the parent record through IDVAR and IDVARVAL, and writing it in long form with the correct QNAM, QLABEL, QORIG, and QEVAL assignments. The logic does not change. Only the metadata does.

SUPPQUAL Structure, The Part That Trips People

Most experienced programmers know the basic structure. But a few details still cause submission defects and are worth stating clearly before getting into the macro design.

IDVARVAL is always character. Sequence variables in parent domains such as AESEQ, CMSEQ, LBSEQ, and EXSEQ are numeric in the parent dataset. IDVARVAL in the SUPP dataset must be the character version of that value. This conversion is straightforward, put(AESEQ, best32.) gives you "1", "2", "3", but it must be handled explicitly. If the source dataset uses a custom format on the sequence variable, you want the raw numeric-to-character conversion, not the formatted value. The macro should detect IDVAR type at runtime and apply best32. for numeric IDVARs.

IDVAR and IDVARVAL are blank for one-record-per-subject domains. The DM domain has one record per subject. SUPPDM therefore has no meaningful IDVAR, so both IDVAR and IDVARVAL are left blank. Per SDTMIG Section 8.4, this is valid when there is only one record per subject in the related domain.

Records with missing QVAL must be excluded. Supplemental qualifier records with no data value should not be carried into the final SUPP dataset. Missing QVAL is a common validation issue and should be filtered out during generation.

Variable Type Length Key Notes
STUDYIDChar≤200Copied from parent domain
RDOMAINChar2Parent domain abbreviation
USUBJIDChar≤200Copied from parent domain
IDVARChar8Sequence variable name, blank for one-record-per-subject domains
IDVARVALChar≤200Always character, blank when IDVAR is blank
QNAMChar8Must start with a letter, uppercase, alphanumeric, 8 characters or less
QLABELChar≤40Descriptive label for QNAM
QVALChar≤200Always character, records with missing values excluded
QORIGChar≤200CRF, Derived, Assigned, and related values per study standard
QEVALChar8Usually blank unless evaluator is required
QNAM naming: QNAM must be 8 characters or less, start with a letter, and contain only letters and digits. Do not rely on output truncation to save you. Validate QNAM values in the control dataset before generation.

The Control Dataset Design

The control dataset is the single source of truth for SUPPQUAL generation. Every QNAM you want to produce corresponds to one row. The control dataset should be maintained like any other programming spec and version-controlled with the study codebase.

Below is a practical variable structure for SUPPQUAL_META.

Variable Type Purpose
RDOMAINChar(2)Parent domain such as AE, CM, LB, EX, DM
QNAMChar(8)Supplemental qualifier name
QLABELChar(40)Label for QNAM
SRC_DSChar(41)Source dataset, libname.memname or memname
SRC_VARChar(32)Source variable in SRC_DS
IDVARChar(8)Linking variable in SRC_DS, blank for DM-style domains
QORIGChar(200)Origin value
QEVALChar(8)Evaluator, usually blank
SRC_ISNUMChar(1)Y if source variable is numeric, N if character
SRC_FMTChar(32)Optional numeric format for QVAL conversion
ACTIVATEChar(1)Y to include, N to skip

A simple example:

/* Example: SUPPQUAL_META contents */
data SUPPQUAL_META;
  length rdomain $2 qnam $8 qlabel $40 src_ds $41 src_var $32
         idvar $8 qorig $200 qeval $8 src_isnum $1 src_fmt $32
         activate $1;
  infile datalines dlm='|' truncover;
  input rdomain $ qnam $ qlabel $ src_ds $ src_var $
        idvar $ qorig $ qeval $ src_isnum $ src_fmt $ activate $;
datalines;
AE|AECNTRY|Country Where Event Occurred|WORK.AE_WORK|AECNTRY|AESEQ|CRF||N||Y
AE|AEREL2|Causality Second Study Drug|WORK.AE_WORK|CAUSDG2|AESEQ|CRF||N||Y
AE|AESLIFE|Resulted in Life Threatening|WORK.AE_WORK|SLIFE|AESEQ|CRF||N||Y
CM|CMDOSU2|Secondary Dose Unit|WORK.CM_WORK|DOSU2|CMSEQ|CRF||N||Y
CM|CMROUTE2|Secondary Route|WORK.CM_WORK|ROUTE2|CMSEQ|CRF||N||Y
DM|DMHISP|Hispanic or Latino|WORK.DM_WORK|HISPANIC||CRF||N||Y
EX|EXLOC|Location of Administration|WORK.EX_WORK|LOCATION|EXSEQ|CRF||N||Y
LB|LBANMETH|Analysis Method|WORK.LB_WORK|ANMETH|LBSEQ|CRF||N||Y
LB|LBDSTRESC|Derived Result Char|WORK.LB_WORK|DSTR|LBSEQ|Derived||N||Y
VS|VSMETHOD|Method of Measurement|WORK.VS_WORK|METHOD|VSSEQ|CRF||N||N
;
run;
DM handling: Because DM has one record per subject, SUPPDM should have blank IDVAR and IDVARVAL. That is valid and should not be treated as missing metadata.

The Macro Architecture

This design uses two macros. A helper macro, %_supp_one, handles one QNAM row at a time. A driver macro, %gen_suppqual, reads the control dataset, calls the helper for each active row, stacks the results, and writes one SUPP dataset per parent domain.

That split matters. The helper is easier to test in isolation. The driver handles orchestration without mixing in domain-level data logic.

Assumption: Every source dataset listed in SRC_DS must already contain STUDYID and USUBJID. Do not point this macro directly at raw collection datasets that have not been standardized.
Truncation risk: QLABEL is defined as $40 and QVAL as $200. SAS truncates silently on assignment when the incoming value is longer. If your source variable can hold long free text, validate lengths before generation.

Helper Macro: %_supp_one

/* ============================================================
   %_SUPP_ONE
   Internal helper. Processes one row from SUPPQUAL_META.
   Creates WORK._SUPP_&i. as output.
   ============================================================ */
%macro _supp_one(
  i  = ,
  rd = ,
  qn = ,
  ql = ,
  sd = ,
  sv = ,
  iv = ,
  qo = CRF,
  qe = ,
  sn = N,
  sf =
);

  %local _dsid_ _varid_ _ivid_ _ivtype_ _qn_len_;

  /* 1. Validate source dataset exists */
  %let _dsid_ = %sysfunc(open(&sd.));
  %if &_dsid_. = 0 %then %do;
    %put WARNING: [_supp_one] Dataset &sd. not found. Skipping QNAM=&qn. (RDOMAIN=&rd.);
    %return;
  %end;

  /* 2. Validate source variable exists */
  %let _varid_ = %sysfunc(varnum(&_dsid_., &sv.));
  %if &_varid_. = 0 %then %do;
    %let _dsid_ = %sysfunc(close(&_dsid_.));
    %put WARNING: [_supp_one] Variable &sv. not found in &sd.. Skipping QNAM=&qn. (RDOMAIN=&rd.);
    %return;
  %end;

  /* 3. Validate IDVAR and detect its type */
  %let _ivtype_ = C;
  %if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then %do;
    %let _ivid_ = %sysfunc(varnum(&_dsid_., &iv.));
    %if &_ivid_. = 0 %then %do;
      %let _dsid_ = %sysfunc(close(&_dsid_.));
      %put WARNING: [_supp_one] IDVAR=&iv. not found in &sd.. Skipping QNAM=&qn. (RDOMAIN=&rd.);
      %return;
    %end;
    %let _ivtype_ = %sysfunc(vartype(&_dsid_., &_ivid_.));
  %end;

  %let _dsid_ = %sysfunc(close(&_dsid_.));

  /* 4. Guard QNAM length in macro context before DATA step compile */
  %let _qn_len_ = %length(%sysfunc(strip(%upcase(&qn.))));
  %if &_qn_len_. > 8 %then %do;
    %put WARNING: [_supp_one] QNAM=&qn. is &_qn_len_. characters and exceeds the 8-character limit.;
    %put WARNING: [_supp_one] Output will be truncated. Fix SUPPQUAL_META before submission.;
  %end;

  /* 5. Build one QNAM contribution dataset */
  data _supp_&i_.
    (keep=studyid rdomain usubjid idvar idvarval qnam qlabel qval qorig qeval);

    length
      studyid  $200
      rdomain  $2
      usubjid  $200
      idvar    $8
      idvarval $200
      qnam     $8
      qlabel   $40
      qval     $200
      qorig    $200
      qeval    $8;

    set &sd. (keep=studyid usubjid
                   %if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then &iv.;
                   &sv.);

    rdomain = "&rd.";
    idvar   = "%sysfunc(strip(&iv.))";
    qnam    = "%upcase(%sysfunc(strip(&qn.)))";
    qlabel  = "&ql.";
    qorig   = "&qo.";
    qeval   = "&qe.";

    %if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then %do;
      %if &_ivtype_. = N %then %do;
        idvarval = strip(put(&iv., best32.));
      %end;
      %else %do;
        idvarval = strip(&iv.);
      %end;
    %end;
    %else %do;
      idvarval = '';
    %end;

    %if %upcase(&sn.) = Y %then %do;
      %if %sysfunc(lengthn(%sysfunc(strip(&sf.)))) > 0 %then %do;
        qval = strip(put(&sv., &sf..));
      %end;
      %else %do;
        qval = strip(put(&sv., best32.));
      %end;
    %end;
    %else %do;
      qval = strip(&sv.);
    %end;

    if missing(qval) then delete;
  run;

%mend _supp_one;

A few details here matter. %sysfunc(vartype()) detects whether IDVAR is numeric or character before the DATA step runs. That matters because most sequence variables are numeric in the parent dataset but must be written as character in IDVARVAL. Using put(AESEQ, best32.) gives a clean unformatted value. Using vvalue() would return the formatted value and can create the wrong result when custom formats exist.

The QNAM length guard lives in macro context for a reason. Once QNAM is assigned to a DATA step variable declared as $8, SAS truncates it immediately and silently. Any later length(qnam) check in the DATA step is too late, the value has already been shortened. The macro check catches the original full control-dataset value before compilation.

The missing(qval) delete is not optional. If the qualifier has no data value, it should not be written out as a SUPP record.

Driver Macro: %gen_suppqual

/* ============================================================
   %GEN_SUPPQUAL
   Driver macro. Reads SUPPQUAL_META, calls %_supp_one for
   each row, stacks results, and writes one SUPP dataset per
   parent domain to OUTLIB.
   ============================================================ */
%macro gen_suppqual(
  meta   = WORK.SUPPQUAL_META,
  outlib = WORK
);

  %local _nrows_ _i_ _any_created_ _domains_ _nd_ _d_ _dom_ _dsid_ _nobs_;

  /* 1. Validate metadata dataset exists */
  %if not %sysfunc(exist(&meta.)) %then %do;
    %put ERROR: [gen_suppqual] Control dataset &meta. not found. Macro aborted.;
    %return;
  %end;

  /* 2. Count active rows */
  proc sql noprint;
    select count(*) into :_nrows_ trimmed
    from &meta.
    where not missing(rdomain)
      and not missing(qnam)
      and not missing(src_ds)
      and not missing(src_var)
      and upcase(coalescec(activate,'Y')) = 'Y';
  quit;

  %if &_nrows_. = 0 %then %do;
    %put WARNING: [gen_suppqual] No valid rows found in &meta.. Macro aborted.;
    %return;
  %end;

  %put NOTE: [gen_suppqual] Found &_nrows_. active QNAM entries to process.;

  /* 3. Load metadata into macro variable arrays */
  %do _i_ = 1 %to &_nrows_.;
    %local _rd&_i_. _qn&_i_. _ql&_i_. _sd&_i_. _sv&_i_.
           _iv&_i_. _qo&_i_. _qe&_i_. _sn&_i_. _sf&_i_.;
  %end;

  proc sql noprint;
    select
      strip(rdomain),
      strip(qnam),
      strip(qlabel),
      strip(src_ds),
      strip(src_var),
      strip(coalescec(idvar,'')),
      strip(coalescec(qorig,'CRF')),
      strip(coalescec(qeval,'')),
      strip(coalescec(src_isnum,'N')),
      strip(coalescec(src_fmt,''))
    into
      :_rd1 - :_rd&_nrows_.,
      :_qn1 - :_qn&_nrows_.,
      :_ql1 - :_ql&_nrows_.,
      :_sd1 - :_sd&_nrows_.,
      :_sv1 - :_sv&_nrows_.,
      :_iv1 - :_iv&_nrows_.,
      :_qo1 - :_qo&_nrows_.,
      :_qe1 - :_qe&_nrows_.,
      :_sn1 - :_sn&_nrows_.,
      :_sf1 - :_sf&_nrows_.
    from &meta.
    where not missing(rdomain)
      and not missing(qnam)
      and not missing(src_ds)
      and not missing(src_var)
      and upcase(coalescec(activate,'Y')) = 'Y'
    order by rdomain, qnam;
  quit;

  /* 4. Call helper macro */
  %do _i_ = 1 %to &_nrows_.;
    %_supp_one(
      i  = &_i_.,
      rd = &&_rd&_i_.,
      qn = &&_qn&_i_.,
      ql = &&_ql&_i_.,
      sd = &&_sd&_i_.,
      sv = &&_sv&_i_.,
      iv = &&_iv&_i_.,
      qo = &&_qo&_i_.,
      qe = &&_qe&_i_.,
      sn = &&_sn&_i_.,
      sf = &&_sf&_i_.
    );
  %end;

  /* 5. Check output existence */
  %let _any_created_ = 0;
  %do _i_ = 1 %to &_nrows_.;
    %if %sysfunc(exist(work._supp_&_i_.)) %then %let _any_created_ = 1;
  %end;

  %if &_any_created_. = 0 %then %do;
    %put WARNING: [gen_suppqual] No contribution datasets created. Check source datasets and variable names.;
    %return;
  %end;

  /* 6. Stack all contributions */
  data _supp_all_;
    set
    %do _i_ = 1 %to &_nrows_.;
      %if %sysfunc(exist(work._supp_&_i_.)) %then work._supp_&_i_.;
    %end;
    ;
  run;

  /* 7. Get list of represented domains */
  proc sql noprint;
    select distinct strip(rdomain) into :_domains_ separated by '|'
    from _supp_all_
    where not missing(rdomain);

    select count(distinct rdomain) into :_nd_ trimmed
    from _supp_all_
    where not missing(rdomain);
  quit;

  /* 8. Write one SUPP dataset per domain */
  %do _d_ = 1 %to &_nd_.;
    %let _dom_ = %scan(&_domains_., &_d_., |);

    proc sort
      data=_supp_all_(where=(rdomain="&_dom_."))
      out=&outlib..supp&_dom_.(label="Supplemental Qualifiers for %upcase(&_dom_.)");
      by studyid rdomain usubjid idvar idvarval qnam;
    run;

    %let _dsid_ = %sysfunc(open(&outlib..supp&_dom_.));
    %let _nobs_ = %sysfunc(attrn(&_dsid_., nobs));
    %let _dsid_ = %sysfunc(close(&_dsid_.));

    %put NOTE: [gen_suppqual] &outlib..SUPP&_dom_. created with &_nobs_. observations.;
  %end;

  /* 9. Clean up */
  proc datasets lib=work nolist nowarn;
    delete _supp_:;
  quit;

  %put NOTE: [gen_suppqual] Complete. &_nd_. SUPP dataset(s) written to &outlib..;

%mend gen_suppqual;

Calling the Macro

Once the control dataset is ready and the macros are compiled, generation becomes a single call.

%validate_meta(meta=SDTM.SUPPQUAL_META);
%gen_suppqual(meta=SDTM.SUPPQUAL_META, outlib=SDTM);

The SAS log should show one NOTE per created SUPP dataset along with the observation count. Any source-dataset or source-variable mismatches should produce clear WARNING messages that point back to the relevant QNAM and RDOMAIN.

Handling Numeric QVAL Variables

Most SUPPQUAL values start as character variables. But some studies need numeric source variables carried into QVAL. The combination of SRC_ISNUM and SRC_FMT handles that cleanly.

data SUPPQUAL_META;
  set SUPPQUAL_META;
  if qnam = 'LBCALC' then do;
    src_isnum = 'Y';
    src_fmt   = '8.3';
  end;
run;

If SRC_ISNUM=Y and SRC_FMT is blank, the macro uses best32.. If you need fixed decimal places preserved in QVAL, provide an explicit numeric format in the metadata.

Decimal display: best32. will turn 2.0 into "2", not "2.0". If decimal presentation matters, set SRC_FMT to a value such as 8.2.

Pinnacle 21 and Metadata Validation

Correct macro output is not enough. You should validate the metadata before generation and check the output after generation.

A practical validation macro is shown below. It checks QNAM length, QNAM naming pattern, uppercase consistency, and missing required fields before the driver macro runs.

/* ============================================================
   %VALIDATE_META
   Run before %GEN_SUPPQUAL. Aborts on any failure.
   ============================================================ */
%macro validate_meta(meta=WORK.SUPPQUAL_META, outlib=WORK);

  proc sql noprint;
    create table &outlib.._meta_errors as

    select 'SD0083' as check_id, rdomain, qnam,
           'QNAM exceeds 8 characters' as description
    from &meta.
    where length(strip(qnam)) > 8

    union all

    select 'SD0082' as check_id, rdomain, qnam,
           'QNAM fails naming convention' as description
    from &meta.
    where not prxmatch('/^[A-Za-z][A-Za-z0-9]{0,7}$/', strip(qnam))

    union all

    select 'FORMAT' as check_id, rdomain, qnam,
           'QNAM contains lowercase and should be uppercase in metadata' as description
    from &meta.
    where qnam ne upcase(qnam)

    union all

    select 'MISSING' as check_id, rdomain, qnam,
           cats('Missing required field: ',
                case
                  when missing(src_ds) then 'SRC_DS'
                  when missing(src_var) then 'SRC_VAR'
                  else 'RDOMAIN or QNAM'
                end) as description
    from &meta.
    where missing(src_ds)
       or missing(src_var)
       or missing(rdomain)
       or missing(qnam);
  quit;

  %local _nerr_;
  proc sql noprint;
    select count(*) into :_nerr_ trimmed
    from &outlib.._meta_errors;
  quit;

  %if &_nerr_. > 0 %then %do;
    proc print data=&outlib.._meta_errors noobs; run;
    %put ERROR: [validate_meta] &_nerr_. validation issue(s) found in &meta..;
    %put ERROR: [validate_meta] Fix metadata before calling %nrstr(%gen_suppqual).;
    %abort cancel;
  %end;
  %else %do;
    %put NOTE: [validate_meta] Metadata checks passed.;
  %end;

%mend validate_meta;

You should also run a post-generation duplicate check. Duplicate QNAM values for the same parent record are a common structural issue when the source working dataset has not been deduplicated correctly.

proc sort data=SDTM.SUPPAE nodupkey dupout=_dups_;
  by studyid rdomain usubjid idvar idvarval qnam;
run;

%if %sysfunc(exist(_dups_)) %then %do;
  %local _dsid_;
  %let _dsid_ = %sysfunc(open(_dups_));
  %if %sysfunc(attrn(&_dsid_., nobs)) > 0 %then
    %put WARNING: Duplicate QNAM records found in SUPPAE. Review source data.;
  %let _dsid_ = %sysfunc(close(&_dsid_));
%end;

Extending the Approach

This macro assumes the source variable lives in the same dataset that already contains STUDYID, USUBJID, and the linking IDVAR. That is the cleanest pattern and the one most teams should aim for. If your supplemental variable lives elsewhere and needs to be joined back to the parent domain to obtain the sequence number, do that in a preprocessing step. Keep the SUPPQUAL macro focused on extraction and formatting, not on dataset joins.

The same metadata can also help feed define.xml generation. QNAM, QLABEL, QORIG, and QEVAL already exist in the control dataset, so you are not maintaining the same information in two different places.

Summary

A metadata-driven SUPPQUAL framework does not remove complexity. It moves it into one controlled dataset where it is visible, testable, and easier to maintain. The macros shown here handle the main technical requirements, IDVAR type detection, numeric-to-character conversion, blank IDVAR handling for one-record-per-subject domains, QVAL exclusion when missing, and output sorted into submission-ready SUPP datasets.

The biggest payoff comes when the specification changes. Adding a new QNAM becomes a metadata change, not a program rewrite.

For regulated studies, keep these macros in a controlled and versioned macro library. Do not pull production macros from ad hoc URLs at runtime. The code used for submission work should come from a locked, auditable source.

Source-aligned with SDTMIG v3.3 Section 8.4. Numeric-to-character IDVARVAL conversion using best32. is consistent with SAS 9.4 behavior. Validation outcomes should still be checked against the Pinnacle 21 version defined in the study validation plan.


Tags: SDTM, SUPPQUAL, SAS Macros, Metadata-Driven Programming, Define.xml, Pinnacle 21, Regulatory Submissions