Discover More Tips and Techniques on This Blog

Advanced SDTM Programming Tips

Advanced SDTM Programming Tips

Streamline Your SDTM Development with Expert Techniques

Tip 1: Automating SUPPQUAL Domain Creation

The SUPPQUAL (Supplemental Qualifiers) domain can be automated using SAS macros to handle additional variables in a systematic way. Refer to the macro example provided earlier to simplify your SUPPQUAL generation process.

Tip 2: Handling Date Imputation

Many SDTM domains require complete dates, but raw data often contains partial or missing dates. Use the following code snippet for date imputation:

                
data imputed_dates;
    set raw_data;
    /* Impute missing day to the first day of the month */
    if length(strip(date)) = 7 then date = cats(date, '-01');
    /* Impute missing month and day to January 1st */
    else if length(strip(date)) = 4 then date = cats(date, '-01-01');
    format date yymmdd10.;
run;
                
            

Tip: Always document the imputation logic and ensure it aligns with the study protocol.

Tip 3: Dynamic Variable Label Assignment

Avoid hardcoding labels when creating SDTM domains. Use metadata-driven programming for consistency:

                
data AE;
    set raw_ae;
    attrib
        AESTDTC label="Start Date/Time of Adverse Event"
        AEENDTC label="End Date/Time of Adverse Event";
run;
                
            

Tip: Store labels in a metadata file (e.g., Excel or CSV) and read them dynamically in your program.

Tip 4: Efficient Use of Pinnacle 21 Outputs

Pinnacle 21 validation reports can be overwhelming. Focus on the following key areas:

  • Major Errors: Address structural and required variable issues first.
  • Traceability: Ensure SUPPQUAL variables and parent records are linked correctly.
  • Controlled Terminology: Verify values against the CDISC CT library to avoid deviations.

Tip: Use Excel formulas or conditional formatting to prioritize findings in Pinnacle 21 reports.

Tip 5: Debugging Complex Mapping Issues

When debugging mapping logic, use PUTLOG statements strategically:

                
data SDTM_AE;
    set raw_ae;
    if missing(AEDECOD) then putlog "WARNING: Missing AEDECOD for USUBJID=" USUBJID;
run;
                
            

Tip: Use PUTLOG with conditions to reduce unnecessary log clutter.

Tip 6: Mapping RELREC Domain

The RELREC domain is used to define relationships between datasets. Automate its creation using a data-driven approach:

                
data RELREC;
    set parent_data;
    RELID = "REL1";
    RDOMAIN1 = "AE"; USUBJID1 = USUBJID; IDVAR1 = "AESEQ"; IDVARVAL1 = AESEQ;
    RDOMAIN2 = "CM"; USUBJID2 = USUBJID; IDVAR2 = "CMSEQ"; IDVARVAL2 = CMSEQ;
    output;
run;
                
            

Tip: Validate RELREC with Pinnacle 21 to ensure all relationships are correctly represented.

Tip 7: Using PROC DATASETS for Efficiency

Leverage PROC DATASETS for efficient dataset management:

                
                
proc datasets lib=work nolist;
    modify AE;
        label AESTDTC = "Start Date/Time of Adverse Event"
              AEENDTC = "End Date/Time of Adverse Event";
    run;
quit;
                
            

Tip: Use PROC DATASETS to modify attributes like labels, formats, and lengths without rewriting the dataset.

Tip 8: Deriving Epoch Variables

EPOCH is a critical variable in SDTM domains, representing the study period during which an event occurred. Automate its derivation as follows:

                
data AE;
    set AE;
    if AESTDTC >= TRTSDTC and AESTDTC <= TRTEDTC then EPOCH = "TREATMENT";
    else if AESTDTC < TRTSDTC then EPOCH = "SCREENING";
    else if AESTDTC > TRTEDTC then EPOCH = "FOLLOW-UP";
run;
                
            

Tip: Ensure EPOCH values are consistent with the study design and align with other SDTM domains like EX and SV.

Tip 9: Validating VISITNUM and VISIT Variables

VISITNUM and VISIT are critical for aligning events with planned visits. Use a reference table for consistency:

                
proc sql;
    create table validated_data as
    select a.*, b.VISIT
    from raw_data a
    left join visit_reference b
    on a.VISITNUM = b.VISITNUM;
quit;
                
            

Tip: Cross-check derived VISITNUM and VISIT values against the Trial Design domains (e.g., TV and TA).

Tip 10: Generating Define.XML Annotations

Define.XML is a crucial deliverable for SDTM datasets. Use metadata to dynamically create annotations:

                
data define_annotations;
    set metadata;
    xml_annotation = cats("<ItemDef OID='IT.", name, "' Name='", name, 
                          "' Label='", label, "' DataType='", type, "'/>");
run;

proc print data=define_annotations noobs; run;
                
            

Tip: Validate the Define.XML file using tools like Pinnacle 21 or XML validators to ensure compliance.

Written by Sarath Annapareddy | For more SDTM tips, stay tuned!

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.