Dynamic SUPPQUAL Generation Using Metadata-Driven SAS Macros
If you have managed SDTM deliverables across multiple studies at the same time, you already know what happens to SUPPQUAL programs. You start with one clean macro per domain. Then a protocol amendment adds three new supplemental variables to AE. A PMDA query forces a second QNAM into EX. The DMC needs something non-standard from LB. Six months later you have domain-specific programs with no shared logic, no central control, and every change requires manual updates across multiple files.
The metadata-driven approach fixes that. One control dataset. One macro family. All SUPP domains generated in a single call. This post walks through the full design, control dataset structure, macro architecture, the edge cases that cause real production issues, and the validation checks you should run before submission.
The Problem with Hard-Coded SUPPQUAL Programs
Hard-coded programs break in predictable ways. A QNAM gets added mid-study, someone has to find the right program, understand its structure, add the variable, and QC the whole thing again. An amendment renames a CRF field, IDVARVAL logic breaks silently because the source variable changed. A programmer unfamiliar with the original design adds a duplicate QNAM to a different SUPP domain by mistake. None of these are dramatic failures. They are the slow buildup of technical debt across a submission package.
The deeper issue is that SUPPQUAL generation is structurally the same across parent domains. For every QNAM, you are doing the same thing, pulling a variable from a source dataset, converting it to character, attaching the right RDOMAIN, linking it back to the parent record through IDVAR and IDVARVAL, and writing it in long form with the correct QNAM, QLABEL, QORIG, and QEVAL assignments. The logic does not change. Only the metadata does.
SUPPQUAL Structure, The Part That Trips People
Most experienced programmers know the basic structure. But a few details still cause submission defects and are worth stating clearly before getting into the macro design.
IDVARVAL is always character. Sequence variables in parent domains such as AESEQ, CMSEQ, LBSEQ, and EXSEQ are numeric in the parent dataset. IDVARVAL in the SUPP dataset must be the character version of that value. This conversion is straightforward, put(AESEQ, best32.) gives you "1", "2", "3", but it must be handled explicitly. If the source dataset uses a custom format on the sequence variable, you want the raw numeric-to-character conversion, not the formatted value. The macro should detect IDVAR type at runtime and apply best32. for numeric IDVARs.
IDVAR and IDVARVAL are blank for one-record-per-subject domains. The DM domain has one record per subject. SUPPDM therefore has no meaningful IDVAR, so both IDVAR and IDVARVAL are left blank. Per SDTMIG Section 8.4, this is valid when there is only one record per subject in the related domain.
Records with missing QVAL must be excluded. Supplemental qualifier records with no data value should not be carried into the final SUPP dataset. Missing QVAL is a common validation issue and should be filtered out during generation.
| Variable | Type | Length | Key Notes |
|---|---|---|---|
| STUDYID | Char | ≤200 | Copied from parent domain |
| RDOMAIN | Char | 2 | Parent domain abbreviation |
| USUBJID | Char | ≤200 | Copied from parent domain |
| IDVAR | Char | 8 | Sequence variable name, blank for one-record-per-subject domains |
| IDVARVAL | Char | ≤200 | Always character, blank when IDVAR is blank |
| QNAM | Char | 8 | Must start with a letter, uppercase, alphanumeric, 8 characters or less |
| QLABEL | Char | ≤40 | Descriptive label for QNAM |
| QVAL | Char | ≤200 | Always character, records with missing values excluded |
| QORIG | Char | ≤200 | CRF, Derived, Assigned, and related values per study standard |
| QEVAL | Char | 8 | Usually blank unless evaluator is required |
The Control Dataset Design
The control dataset is the single source of truth for SUPPQUAL generation. Every QNAM you want to produce corresponds to one row. The control dataset should be maintained like any other programming spec and version-controlled with the study codebase.
Below is a practical variable structure for SUPPQUAL_META.
| Variable | Type | Purpose |
|---|---|---|
| RDOMAIN | Char(2) | Parent domain such as AE, CM, LB, EX, DM |
| QNAM | Char(8) | Supplemental qualifier name |
| QLABEL | Char(40) | Label for QNAM |
| SRC_DS | Char(41) | Source dataset, libname.memname or memname |
| SRC_VAR | Char(32) | Source variable in SRC_DS |
| IDVAR | Char(8) | Linking variable in SRC_DS, blank for DM-style domains |
| QORIG | Char(200) | Origin value |
| QEVAL | Char(8) | Evaluator, usually blank |
| SRC_ISNUM | Char(1) | Y if source variable is numeric, N if character |
| SRC_FMT | Char(32) | Optional numeric format for QVAL conversion |
| ACTIVATE | Char(1) | Y to include, N to skip |
A simple example:
/* Example: SUPPQUAL_META contents */
data SUPPQUAL_META;
length rdomain $2 qnam $8 qlabel $40 src_ds $41 src_var $32
idvar $8 qorig $200 qeval $8 src_isnum $1 src_fmt $32
activate $1;
infile datalines dlm='|' truncover;
input rdomain $ qnam $ qlabel $ src_ds $ src_var $
idvar $ qorig $ qeval $ src_isnum $ src_fmt $ activate $;
datalines;
AE|AECNTRY|Country Where Event Occurred|WORK.AE_WORK|AECNTRY|AESEQ|CRF||N||Y
AE|AEREL2|Causality Second Study Drug|WORK.AE_WORK|CAUSDG2|AESEQ|CRF||N||Y
AE|AESLIFE|Resulted in Life Threatening|WORK.AE_WORK|SLIFE|AESEQ|CRF||N||Y
CM|CMDOSU2|Secondary Dose Unit|WORK.CM_WORK|DOSU2|CMSEQ|CRF||N||Y
CM|CMROUTE2|Secondary Route|WORK.CM_WORK|ROUTE2|CMSEQ|CRF||N||Y
DM|DMHISP|Hispanic or Latino|WORK.DM_WORK|HISPANIC||CRF||N||Y
EX|EXLOC|Location of Administration|WORK.EX_WORK|LOCATION|EXSEQ|CRF||N||Y
LB|LBANMETH|Analysis Method|WORK.LB_WORK|ANMETH|LBSEQ|CRF||N||Y
LB|LBDSTRESC|Derived Result Char|WORK.LB_WORK|DSTR|LBSEQ|Derived||N||Y
VS|VSMETHOD|Method of Measurement|WORK.VS_WORK|METHOD|VSSEQ|CRF||N||N
;
run;
The Macro Architecture
This design uses two macros. A helper macro, %_supp_one, handles one QNAM row at a time. A driver macro, %gen_suppqual, reads the control dataset, calls the helper for each active row, stacks the results, and writes one SUPP dataset per parent domain.
That split matters. The helper is easier to test in isolation. The driver handles orchestration without mixing in domain-level data logic.
SRC_DS must already contain STUDYID and USUBJID. Do not point this macro directly at raw collection datasets that have not been standardized.
QLABEL is defined as $40 and QVAL as $200. SAS truncates silently on assignment when the incoming value is longer. If your source variable can hold long free text, validate lengths before generation.
Helper Macro: %_supp_one
/* ============================================================
%_SUPP_ONE
Internal helper. Processes one row from SUPPQUAL_META.
Creates WORK._SUPP_&i. as output.
============================================================ */
%macro _supp_one(
i = ,
rd = ,
qn = ,
ql = ,
sd = ,
sv = ,
iv = ,
qo = CRF,
qe = ,
sn = N,
sf =
);
%local _dsid_ _varid_ _ivid_ _ivtype_ _qn_len_;
/* 1. Validate source dataset exists */
%let _dsid_ = %sysfunc(open(&sd.));
%if &_dsid_. = 0 %then %do;
%put WARNING: [_supp_one] Dataset &sd. not found. Skipping QNAM=&qn. (RDOMAIN=&rd.);
%return;
%end;
/* 2. Validate source variable exists */
%let _varid_ = %sysfunc(varnum(&_dsid_., &sv.));
%if &_varid_. = 0 %then %do;
%let _dsid_ = %sysfunc(close(&_dsid_.));
%put WARNING: [_supp_one] Variable &sv. not found in &sd.. Skipping QNAM=&qn. (RDOMAIN=&rd.);
%return;
%end;
/* 3. Validate IDVAR and detect its type */
%let _ivtype_ = C;
%if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then %do;
%let _ivid_ = %sysfunc(varnum(&_dsid_., &iv.));
%if &_ivid_. = 0 %then %do;
%let _dsid_ = %sysfunc(close(&_dsid_.));
%put WARNING: [_supp_one] IDVAR=&iv. not found in &sd.. Skipping QNAM=&qn. (RDOMAIN=&rd.);
%return;
%end;
%let _ivtype_ = %sysfunc(vartype(&_dsid_., &_ivid_.));
%end;
%let _dsid_ = %sysfunc(close(&_dsid_.));
/* 4. Guard QNAM length in macro context before DATA step compile */
%let _qn_len_ = %length(%sysfunc(strip(%upcase(&qn.))));
%if &_qn_len_. > 8 %then %do;
%put WARNING: [_supp_one] QNAM=&qn. is &_qn_len_. characters and exceeds the 8-character limit.;
%put WARNING: [_supp_one] Output will be truncated. Fix SUPPQUAL_META before submission.;
%end;
/* 5. Build one QNAM contribution dataset */
data _supp_&i_.
(keep=studyid rdomain usubjid idvar idvarval qnam qlabel qval qorig qeval);
length
studyid $200
rdomain $2
usubjid $200
idvar $8
idvarval $200
qnam $8
qlabel $40
qval $200
qorig $200
qeval $8;
set &sd. (keep=studyid usubjid
%if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then &iv.;
&sv.);
rdomain = "&rd.";
idvar = "%sysfunc(strip(&iv.))";
qnam = "%upcase(%sysfunc(strip(&qn.)))";
qlabel = "&ql.";
qorig = "&qo.";
qeval = "&qe.";
%if %sysfunc(lengthn(%sysfunc(strip(&iv.)))) > 0 %then %do;
%if &_ivtype_. = N %then %do;
idvarval = strip(put(&iv., best32.));
%end;
%else %do;
idvarval = strip(&iv.);
%end;
%end;
%else %do;
idvarval = '';
%end;
%if %upcase(&sn.) = Y %then %do;
%if %sysfunc(lengthn(%sysfunc(strip(&sf.)))) > 0 %then %do;
qval = strip(put(&sv., &sf..));
%end;
%else %do;
qval = strip(put(&sv., best32.));
%end;
%end;
%else %do;
qval = strip(&sv.);
%end;
if missing(qval) then delete;
run;
%mend _supp_one;
A few details here matter. %sysfunc(vartype()) detects whether IDVAR is numeric or character before the DATA step runs. That matters because most sequence variables are numeric in the parent dataset but must be written as character in IDVARVAL. Using put(AESEQ, best32.) gives a clean unformatted value. Using vvalue() would return the formatted value and can create the wrong result when custom formats exist.
The QNAM length guard lives in macro context for a reason. Once QNAM is assigned to a DATA step variable declared as $8, SAS truncates it immediately and silently. Any later length(qnam) check in the DATA step is too late, the value has already been shortened. The macro check catches the original full control-dataset value before compilation.
The missing(qval) delete is not optional. If the qualifier has no data value, it should not be written out as a SUPP record.
Driver Macro: %gen_suppqual
/* ============================================================
%GEN_SUPPQUAL
Driver macro. Reads SUPPQUAL_META, calls %_supp_one for
each row, stacks results, and writes one SUPP dataset per
parent domain to OUTLIB.
============================================================ */
%macro gen_suppqual(
meta = WORK.SUPPQUAL_META,
outlib = WORK
);
%local _nrows_ _i_ _any_created_ _domains_ _nd_ _d_ _dom_ _dsid_ _nobs_;
/* 1. Validate metadata dataset exists */
%if not %sysfunc(exist(&meta.)) %then %do;
%put ERROR: [gen_suppqual] Control dataset &meta. not found. Macro aborted.;
%return;
%end;
/* 2. Count active rows */
proc sql noprint;
select count(*) into :_nrows_ trimmed
from &meta.
where not missing(rdomain)
and not missing(qnam)
and not missing(src_ds)
and not missing(src_var)
and upcase(coalescec(activate,'Y')) = 'Y';
quit;
%if &_nrows_. = 0 %then %do;
%put WARNING: [gen_suppqual] No valid rows found in &meta.. Macro aborted.;
%return;
%end;
%put NOTE: [gen_suppqual] Found &_nrows_. active QNAM entries to process.;
/* 3. Load metadata into macro variable arrays */
%do _i_ = 1 %to &_nrows_.;
%local _rd&_i_. _qn&_i_. _ql&_i_. _sd&_i_. _sv&_i_.
_iv&_i_. _qo&_i_. _qe&_i_. _sn&_i_. _sf&_i_.;
%end;
proc sql noprint;
select
strip(rdomain),
strip(qnam),
strip(qlabel),
strip(src_ds),
strip(src_var),
strip(coalescec(idvar,'')),
strip(coalescec(qorig,'CRF')),
strip(coalescec(qeval,'')),
strip(coalescec(src_isnum,'N')),
strip(coalescec(src_fmt,''))
into
:_rd1 - :_rd&_nrows_.,
:_qn1 - :_qn&_nrows_.,
:_ql1 - :_ql&_nrows_.,
:_sd1 - :_sd&_nrows_.,
:_sv1 - :_sv&_nrows_.,
:_iv1 - :_iv&_nrows_.,
:_qo1 - :_qo&_nrows_.,
:_qe1 - :_qe&_nrows_.,
:_sn1 - :_sn&_nrows_.,
:_sf1 - :_sf&_nrows_.
from &meta.
where not missing(rdomain)
and not missing(qnam)
and not missing(src_ds)
and not missing(src_var)
and upcase(coalescec(activate,'Y')) = 'Y'
order by rdomain, qnam;
quit;
/* 4. Call helper macro */
%do _i_ = 1 %to &_nrows_.;
%_supp_one(
i = &_i_.,
rd = &&_rd&_i_.,
qn = &&_qn&_i_.,
ql = &&_ql&_i_.,
sd = &&_sd&_i_.,
sv = &&_sv&_i_.,
iv = &&_iv&_i_.,
qo = &&_qo&_i_.,
qe = &&_qe&_i_.,
sn = &&_sn&_i_.,
sf = &&_sf&_i_.
);
%end;
/* 5. Check output existence */
%let _any_created_ = 0;
%do _i_ = 1 %to &_nrows_.;
%if %sysfunc(exist(work._supp_&_i_.)) %then %let _any_created_ = 1;
%end;
%if &_any_created_. = 0 %then %do;
%put WARNING: [gen_suppqual] No contribution datasets created. Check source datasets and variable names.;
%return;
%end;
/* 6. Stack all contributions */
data _supp_all_;
set
%do _i_ = 1 %to &_nrows_.;
%if %sysfunc(exist(work._supp_&_i_.)) %then work._supp_&_i_.;
%end;
;
run;
/* 7. Get list of represented domains */
proc sql noprint;
select distinct strip(rdomain) into :_domains_ separated by '|'
from _supp_all_
where not missing(rdomain);
select count(distinct rdomain) into :_nd_ trimmed
from _supp_all_
where not missing(rdomain);
quit;
/* 8. Write one SUPP dataset per domain */
%do _d_ = 1 %to &_nd_.;
%let _dom_ = %scan(&_domains_., &_d_., |);
proc sort
data=_supp_all_(where=(rdomain="&_dom_."))
out=&outlib..supp&_dom_.(label="Supplemental Qualifiers for %upcase(&_dom_.)");
by studyid rdomain usubjid idvar idvarval qnam;
run;
%let _dsid_ = %sysfunc(open(&outlib..supp&_dom_.));
%let _nobs_ = %sysfunc(attrn(&_dsid_., nobs));
%let _dsid_ = %sysfunc(close(&_dsid_.));
%put NOTE: [gen_suppqual] &outlib..SUPP&_dom_. created with &_nobs_. observations.;
%end;
/* 9. Clean up */
proc datasets lib=work nolist nowarn;
delete _supp_:;
quit;
%put NOTE: [gen_suppqual] Complete. &_nd_. SUPP dataset(s) written to &outlib..;
%mend gen_suppqual;
Calling the Macro
Once the control dataset is ready and the macros are compiled, generation becomes a single call.
%validate_meta(meta=SDTM.SUPPQUAL_META); %gen_suppqual(meta=SDTM.SUPPQUAL_META, outlib=SDTM);
The SAS log should show one NOTE per created SUPP dataset along with the observation count. Any source-dataset or source-variable mismatches should produce clear WARNING messages that point back to the relevant QNAM and RDOMAIN.
Handling Numeric QVAL Variables
Most SUPPQUAL values start as character variables. But some studies need numeric source variables carried into QVAL. The combination of SRC_ISNUM and SRC_FMT handles that cleanly.
data SUPPQUAL_META;
set SUPPQUAL_META;
if qnam = 'LBCALC' then do;
src_isnum = 'Y';
src_fmt = '8.3';
end;
run;
If SRC_ISNUM=Y and SRC_FMT is blank, the macro uses best32.. If you need fixed decimal places preserved in QVAL, provide an explicit numeric format in the metadata.
best32. will turn 2.0 into "2", not "2.0". If decimal presentation matters, set SRC_FMT to a value such as 8.2.
Pinnacle 21 and Metadata Validation
Correct macro output is not enough. You should validate the metadata before generation and check the output after generation.
A practical validation macro is shown below. It checks QNAM length, QNAM naming pattern, uppercase consistency, and missing required fields before the driver macro runs.
/* ============================================================
%VALIDATE_META
Run before %GEN_SUPPQUAL. Aborts on any failure.
============================================================ */
%macro validate_meta(meta=WORK.SUPPQUAL_META, outlib=WORK);
proc sql noprint;
create table &outlib.._meta_errors as
select 'SD0083' as check_id, rdomain, qnam,
'QNAM exceeds 8 characters' as description
from &meta.
where length(strip(qnam)) > 8
union all
select 'SD0082' as check_id, rdomain, qnam,
'QNAM fails naming convention' as description
from &meta.
where not prxmatch('/^[A-Za-z][A-Za-z0-9]{0,7}$/', strip(qnam))
union all
select 'FORMAT' as check_id, rdomain, qnam,
'QNAM contains lowercase and should be uppercase in metadata' as description
from &meta.
where qnam ne upcase(qnam)
union all
select 'MISSING' as check_id, rdomain, qnam,
cats('Missing required field: ',
case
when missing(src_ds) then 'SRC_DS'
when missing(src_var) then 'SRC_VAR'
else 'RDOMAIN or QNAM'
end) as description
from &meta.
where missing(src_ds)
or missing(src_var)
or missing(rdomain)
or missing(qnam);
quit;
%local _nerr_;
proc sql noprint;
select count(*) into :_nerr_ trimmed
from &outlib.._meta_errors;
quit;
%if &_nerr_. > 0 %then %do;
proc print data=&outlib.._meta_errors noobs; run;
%put ERROR: [validate_meta] &_nerr_. validation issue(s) found in &meta..;
%put ERROR: [validate_meta] Fix metadata before calling %nrstr(%gen_suppqual).;
%abort cancel;
%end;
%else %do;
%put NOTE: [validate_meta] Metadata checks passed.;
%end;
%mend validate_meta;
You should also run a post-generation duplicate check. Duplicate QNAM values for the same parent record are a common structural issue when the source working dataset has not been deduplicated correctly.
proc sort data=SDTM.SUPPAE nodupkey dupout=_dups_;
by studyid rdomain usubjid idvar idvarval qnam;
run;
%if %sysfunc(exist(_dups_)) %then %do;
%local _dsid_;
%let _dsid_ = %sysfunc(open(_dups_));
%if %sysfunc(attrn(&_dsid_., nobs)) > 0 %then
%put WARNING: Duplicate QNAM records found in SUPPAE. Review source data.;
%let _dsid_ = %sysfunc(close(&_dsid_));
%end;
Extending the Approach
This macro assumes the source variable lives in the same dataset that already contains STUDYID, USUBJID, and the linking IDVAR. That is the cleanest pattern and the one most teams should aim for. If your supplemental variable lives elsewhere and needs to be joined back to the parent domain to obtain the sequence number, do that in a preprocessing step. Keep the SUPPQUAL macro focused on extraction and formatting, not on dataset joins.
The same metadata can also help feed define.xml generation. QNAM, QLABEL, QORIG, and QEVAL already exist in the control dataset, so you are not maintaining the same information in two different places.
Summary
A metadata-driven SUPPQUAL framework does not remove complexity. It moves it into one controlled dataset where it is visible, testable, and easier to maintain. The macros shown here handle the main technical requirements, IDVAR type detection, numeric-to-character conversion, blank IDVAR handling for one-record-per-subject domains, QVAL exclusion when missing, and output sorted into submission-ready SUPP datasets.
The biggest payoff comes when the specification changes. Adding a new QNAM becomes a metadata change, not a program rewrite.
For regulated studies, keep these macros in a controlled and versioned macro library. Do not pull production macros from ad hoc URLs at runtime. The code used for submission work should come from a locked, auditable source.
Source-aligned with SDTMIG v3.3 Section 8.4. Numeric-to-character IDVARVAL conversion using best32. is consistent with SAS 9.4 behavior. Validation outcomes should still be checked against the Pinnacle 21 version defined in the study validation plan.