Tuesday, December 3, 2024

Advanced SDTM Programming Tips: Streamline Your SDTM Development with Expert Techniques

Advanced SDTM Programming Tips

Streamline Your SDTM Development with Expert Techniques

Tip 1: Automating SUPPQUAL Domain Creation

The SUPPQUAL (Supplemental Qualifiers) domain can be automated using SAS macros to handle additional variables in a systematic way. Refer to the macro example provided earlier to simplify your SUPPQUAL generation process.

Tip 2: Handling Date Imputation

Many SDTM domains require complete dates, but raw data often contains partial or missing dates. Use the following code snippet for date imputation:

                
data imputed_dates;
    set raw_data;
    /* Impute missing day to the first day of the month */
    if length(strip(date)) = 7 then date = cats(date, '-01');
    /* Impute missing month and day to January 1st */
    else if length(strip(date)) = 4 then date = cats(date, '-01-01');
    format date yymmdd10.;
run;

Tip: Always document the imputation logic and ensure it aligns with the study protocol.

Tip 3: Dynamic Variable Label Assignment

Avoid hardcoding labels when creating SDTM domains. Use metadata-driven programming for consistency:

                
data AE;
    set raw_ae;
    attrib
        AESTDTC label="Start Date/Time of Adverse Event"
        AEENDTC label="End Date/Time of Adverse Event";
run;

Tip: Store labels in a metadata file (e.g., Excel or CSV) and read them dynamically in your program.

Tip 4: Efficient Use of Pinnacle 21 Outputs

Pinnacle 21 validation reports can be overwhelming. Focus on the following key areas:

Major Errors: Address structural and required variable issues first.
Traceability: Ensure SUPPQUAL variables and parent records are linked correctly.
Controlled Terminology: Verify values against the CDISC CT library to avoid deviations.

Tip: Use Excel formulas or conditional formatting to prioritize findings in Pinnacle 21 reports.

Tip 5: Debugging Complex Mapping Issues

When debugging mapping logic, use PUTLOG statements strategically:

                
data SDTM_AE;
    set raw_ae;
    if missing(AEDECOD) then putlog "WARNING: Missing AEDECOD for USUBJID=" USUBJID;
run;

Tip: Use PUTLOG with conditions to reduce unnecessary log clutter.

Tip 6: Mapping RELREC Domain

The RELREC domain is used to define relationships between datasets. Automate its creation using a data-driven approach:

                
data RELREC;
    set parent_data;
    RELID = "REL1";
    RDOMAIN1 = "AE"; USUBJID1 = USUBJID; IDVAR1 = "AESEQ"; IDVARVAL1 = AESEQ;
    RDOMAIN2 = "CM"; USUBJID2 = USUBJID; IDVAR2 = "CMSEQ"; IDVARVAL2 = CMSEQ;
    output;
run;

Tip: Validate RELREC with Pinnacle 21 to ensure all relationships are correctly represented.

Tip 7: Using PROC DATASETS for Efficiency

Leverage PROC DATASETS for efficient dataset management:

                
                
proc datasets lib=work nolist;
    modify AE;
        label AESTDTC = "Start Date/Time of Adverse Event"
              AEENDTC = "End Date/Time of Adverse Event";
    run;
quit;

Tip: Use PROC DATASETS to modify attributes like labels, formats, and lengths without rewriting the dataset.

Tip 8: Deriving Epoch Variables

EPOCH is a critical variable in SDTM domains, representing the study period during which an event occurred. Automate its derivation as follows:

                
data AE;
    set AE;
    if AESTDTC >= TRTSDTC and AESTDTC <= TRTEDTC then EPOCH = "TREATMENT";
    else if AESTDTC < TRTSDTC then EPOCH = "SCREENING";
    else if AESTDTC > TRTEDTC then EPOCH = "FOLLOW-UP";
run;

Tip: Ensure EPOCH values are consistent with the study design and align with other SDTM domains like EX and SV.

Tip 9: Validating VISITNUM and VISIT Variables

VISITNUM and VISIT are critical for aligning events with planned visits. Use a reference table for consistency:

                
proc sql;
    create table validated_data as
    select a.*, b.VISIT
    from raw_data a
    left join visit_reference b
    on a.VISITNUM = b.VISITNUM;
quit;

Tip: Cross-check derived VISITNUM and VISIT values against the Trial Design domains (e.g., TV and TA).

Tip 10: Generating Define.XML Annotations

Define.XML is a crucial deliverable for SDTM datasets. Use metadata to dynamically create annotations:

                
data define_annotations;
    set metadata;
    xml_annotation = cats("<ItemDef OID='IT.", name, "' Name='", name, 
                          "' Label='", label, "' DataType='", type, "'/>");
run;

proc print data=define_annotations noobs; run;

Tip: Validate the Define.XML file using tools like Pinnacle 21 or XML validators to ensure compliance.

Advanced SDTM Programming Tips: Automating SUPPQUAL Domain Creation

Advanced SDTM Programming Tip: Automating SUPPQUAL Domain Creation

Optimize Your SDTM Workflows with Efficient Automation Techniques

Introduction to SUPPQUAL Automation

The SUPPQUAL (Supplemental Qualifiers) domain is used to store additional information that cannot fit within a standard SDTM domain. Manually creating the SUPPQUAL domain can be time-consuming and error-prone, especially for large datasets. In this article, we’ll explore an advanced tip to automate its creation using SAS macros.

Use Case: Adding Supplemental Qualifiers to a Domain

Imagine you have an SDTM AE domain (Adverse Events) and need to capture additional details like the investigator’s comments or assessment methods that are not part of the standard AE domain.

Code Example: Automating SUPPQUAL Domain

                
/* Macro to Create SUPPQUAL Domain */
%macro create_suppqual(domain=, idvar=, qnam_list=);
    %let domain_upper = %upcase(&domain);
    %let suppqual = SUPP&domain_upper;

    data &suppqual;
        set &domain;
        length RDOMAIN $8 IDVAR $8 QNAM $8 QLABEL $40 QVAL $200;
        array qvars{*} &qnam_list;
        do i = 1 to dim(qvars);
            if not missing(qvars{i}) then do;
                RDOMAIN = "&domain_upper";
                USUBJID = USUBJID;
                IDVAR = "&idvar";
                IDVARVAL = &idvar;
                QNAM = vname(qvars{i});
                QLABEL = put(QNAM, $40.);
                QVAL = strip(put(qvars{i}, $200.));
                output;
            end;
        end;
        drop i &qnam_list;
    run;

    /* Sort SUPPQUAL for submission readiness */
    proc sort data=&suppqual;
        by USUBJID RDOMAIN IDVAR IDVARVAL QNAM;
    run;
%mend;

/* Example Usage: Automating SUPPAE */
%create_suppqual(domain=AE, idvar=AETERM, qnam_list=AECOMMENT AEASSESS);

Explanation of the Code

RDOMAIN: Captures the parent domain name (e.g., AE).
array qvars{*}: Iterates through the list of supplemental qualifiers provided as macro parameters.
IDVAR: Represents the key variable in the parent domain (e.g., AETERM).
QLABEL: Automatically assigns a label to the qualifier variable.
QVAL: Stores the actual value of the supplemental qualifier.

Advantages of This Approach

Eliminates manual effort in creating SUPPQUAL domains.
Highly reusable and scalable across different domains.
Ensures consistency in handling supplemental qualifiers.

Pro Tip: Validation and Quality Control

Always validate the output SUPPQUAL dataset against CDISC compliance rules using tools like Pinnacle 21. Ensure that all required columns and relationships are correctly populated.

Hash Objects

Advanced SAS Programming Tip: Using HASH Objects

Unlock the Power of SAS for Efficient Data Manipulation

Introduction to HASH Objects

In SAS, HASH objects provide an efficient way to perform in-memory data lookups and merge operations, especially when dealing with large datasets. Unlike traditional joins using PROC SQL or the MERGE statement, HASH objects can significantly reduce computational overhead.

Use Case: Matching and Merging Large Datasets

Suppose you have two datasets: a master dataset containing millions of records and a lookup dataset with unique key-value pairs. The goal is to merge these datasets without compromising performance.

Code Example: Using HASH Objects

                
/* Define the master and lookup datasets */
data master;
    input ID $ Value1 $ Value2 $;
    datalines;
A001 X1 Y1
A002 X2 Y2
A003 X3 Y3
;
run;

data lookup;
    input ID $ LookupValue $;
    datalines;
A001 L1
A002 L2
A003 L3
;
run;

/* Use HASH object to merge datasets */
data merged;
    if _n_ = 1 then do;
        declare hash h(dataset: "lookup");
        h.defineKey("ID");
        h.defineData("LookupValue");
        h.defineDone();
    end;

    set master;
    if h.find() = 0 then output;
run;

/* Display the merged data */
proc print data=merged;
run;

Explanation of the Code

declare hash h: Creates a HASH object and loads the lookup dataset into memory.
h.defineKey: Specifies the key variable (ID) for the lookup.
h.defineData: Identifies the variable to retrieve from the lookup dataset.
h.find(): Searches for a match in the HASH object and retrieves the data if found.

Advantages of HASH Objects

Faster lookups compared to traditional joins, especially with large datasets.
In-memory operations reduce I/O overhead.
Provides greater flexibility for advanced operations.

Advanced SAS Programming Tip: Mastering Macro Variables

Unleash the power of SAS with this advanced technique.

Introduction

Macro variables are a powerful tool in SAS that allow you to dynamically generate code. By understanding and effectively using macro variables, you can write more efficient and flexible SAS programs.

The Basics of Macro Variables

A macro variable is a placeholder that is replaced with its value during macro processing. You define a macro variable using the %LET statement and reference it using the %SYSFUNC or %SYSEVALF functions.

Advanced Techniques

1. Conditional Logic

You can use the %IF-%THEN-%ELSE statements to create conditional logic within your macro code. This allows you to dynamically generate code based on specific conditions.

2. Iterative Processing

The %DO loop can be used to iterate over a range of values or a list of items. This is useful for repetitive tasks, such as generating multiple datasets or reports.

3. Custom Macro Functions

You can create your own custom macro functions to encapsulate complex logic and reuse it throughout your code. This can help to improve code readability and maintainability.

Example: Dynamically Generating SQL Queries

Here's a simple example of how to use macro variables to dynamically generate SQL queries:

```sas %let table_name = my_data; %let where_clause = age > 30; proc sql; select * from &table_name where &where_clause; quit; ```

Conclusion

By mastering macro variables, you can take your SAS programming skills to the next level. Experiment with these techniques to create more powerful and efficient SAS programs.

Tuesday, November 19, 2024

CMENRTPT vs CMENRF in SDTM

Understanding CMENRTPT vs CMENRF in SDTM

By Sarath

Introduction

When working with the Concomitant Medication (CM) domain in SDTM, it's crucial to understand how timing variables like CMENRTPT (End Relative to Reference Time Point) and CMENRF (End Relative to Reference Period) differ and when to use each.

What is CMENRTPT?

CMENRTPT indicates the relationship between the end of the medication and a specific time point, such as the start of treatment or a significant event (e.g., surgery).

Controlled Terminology: BEFORE, AFTER, ONGOING, CONCURRENT
Example: If a medication was stopped before surgery, CMENRTPT = "BEFORE".

What is CMENRF?

CMENRF describes whether the medication ended before, during, or after a specific study period, such as Screening, Treatment, or Follow-up.

Controlled Terminology: BEFORE, DURING, AFTER
Example: If a medication ended during the treatment phase, CMENRF = "DURING".

Key Differences

Aspect	CMENRTPT	CMENRF
Focus	Relationship to a specific time point	Relationship to a study period
Granularity	More precise	Broader
Typical Use Case	Linking to an event like surgery	Contextualizing within a study phase

Example Scenario

Scenario: A medication was stopped two days after surgery, which occurred during the screening phase.

CMENRTPT: AFTER (relative to surgery)
CMENRF: BEFORE (relative to the start of the treatment phase)

Conclusion

By understanding the differences between CMENRTPT and CMENRF, clinical programmers can ensure accurate and standardized representation of concomitant medication timing in SDTM datasets. Always refer to the study protocol and CDISC standards to select the appropriate variable for your data.

Resolving the SAS EG Transcoding Error

Addressing the "Character Data Lost During Transcoding" Issue in SAS EG

Author: Sarath

Date: November 19, 2024

Introduction

While working in SAS Enterprise Guide (SAS EG), you may encounter the error: "Some character data was lost during transcoding in the dataset." This issue typically arises when character data contains unsupported characters or is truncated due to insufficient column lengths. In this blog post, we'll explore the root causes and provide step-by-step solutions.

Common Causes

Unsupported Characters: The data contains special or non-ASCII characters not representable in the session encoding.
Truncation: Character variables are too short to store the full data, leading to loss of information.
Encoding Mismatch: The dataset's encoding differs from the SAS session's encoding.

Step-by-Step Solutions

1. Check Encoding

Identify the encoding of your SAS session and dataset:

proc options option=encoding; run;
proc contents data=tempdata.cm; run;

2. Identify Problematic Characters

Review a sample of the dataset to locate non-representable or truncated characters:

proc print data=tempdata.cm (obs=50); run;

3. Use a Compatible Encoding

Adjust the encoding for your session or dataset. For example, specify UTF-8 if working with multilingual data:

libname tempdata 'path-to-data' inencoding='utf-8';

4. Increase Column Lengths

Expand the length of character variables to avoid truncation:

data tempdata.cm;
    set tempdata.cm;
    length newvar $200; /* Adjust length */
    newvar = oldvar;
run;

5. Transcode the Dataset

Convert the dataset into a compatible encoding:

libname tempdata_in 'path-to-input-data' inencoding='utf-8';
libname tempdata_out 'path-to-output-data' encoding='utf-8';

data tempdata_out.cm;
    set tempdata_in.cm;
run;

6. Modify Encoding with PROC DATASETS

Repair the dataset’s encoding directly:

proc datasets lib=tempdata;
    modify cm / correctencoding='utf-8';
quit;

7. Clean the Data

Handle non-printable or invalid characters using functions like KCOMPRESS or KSTRIP.

Best Practices

Ensure consistent encoding between your data sources and SAS session.
Use UTF-8 encoding for handling multilingual data.
Allocate sufficient column lengths for character variables during data transformation.

Sunday, November 3, 2024

Advanced SAS programming Techniques for SDTM implementation

Advanced SAS Programming Techniques for SDTM Implementation

Date: November 3, 2024

In the realm of clinical trials data management, SDTM (Study Data Tabulation Model) implementation requires sophisticated programming techniques to ensure data accuracy and compliance. This article explores advanced SAS programming methods that can streamline SDTM dataset creation and validation.

1. Efficient Variable Derivation Using Hash Objects

Hash objects in SAS provide a powerful way to perform quick lookups and merges, especially useful when dealing with large SDTM datasets.

data work.ae;

if _n_ = 1 then do;

declare hash h_dm(dataset: "sdtm.dm");

h_dm.definekey("usubjid");

h_dm.definedata("age", "sex", "race");

h_dm.definedone();

end;

set raw.ae;

rc = h_dm.find();

/* Continue processing */

run;

Pro Tip: Hash objects remain in memory throughout the DATA step, making them more efficient than traditional merge operations for large datasets.

2. Standardizing Controlled Terminology with Format Catalogs

Creating and maintaining CDISC-compliant terminology is crucial for SDTM implementation.

proc format library=library.sdtm_formats;

value $severity

'MILD' = 'MILD'

'MOD' = 'MODERATE'

'MODERATE' = 'MODERATE'

'SEV' = 'SEVERE'

'SEVERE' = 'SEVERE'

other = 'UNKNOWN';

run;

data sdtm.ae;

set work.ae;

aesev = put(raw_severity, $severity.);

run;

3. Macro Systems for Dynamic SDTM Generation

Developing reusable macro systems can significantly improve efficiency and reduce errors in SDTM implementation.

%macro create_supp(domain=, vars=);

proc sql noprint;

select distinct usubjid, &vars

into :subjids separated by ',',

:values separated by ','

from sdtm.&domain;

quit;

data sdtm.supp&domain;

set sdtm.&domain(keep=usubjid &vars);

length qnam $8 qlabel $40 qval $200;

/* Generate supplemental qualifiers */

run;

%mend create_supp;

4. Advanced Error Checking and Validation

Implementing robust error-checking mechanisms ensures data quality and compliance with SDTM standards.

%macro validate_domain(domain=);

proc sql noprint;

/* Check for duplicate records */

create table work.duplicates as

select *, count(*) as count

from sdtm.&domain

group by usubjid, &domain.dtc

having count > 1;

/* Verify required variables */

select name into :reqvars separated by ' '

from sashelp.vcolumn

where libname='SDTM' and memname=upcase("&domain")

and name in ('USUBJID', 'DOMAIN', "&domain.SEQ");

quit;

%mend validate_domain;

5. Handling Custom Domains and Extensions

Sometimes, standard SDTM domains need to be extended to accommodate study-specific requirements.

proc sql;

create table sdtm.custom_domain as

select a.usubjid,

a.visit,

b.startdt,

calculated enddt format=datetime20.

from derived.custom_data as a

left join sdtm.sv as b

on a.usubjid = b.usubjid

and a.visit = b.visit;

quit;

6. Optimizing Performance for Large Studies

When dealing with large studies, performance optimization becomes crucial:

Use WHERE clauses instead of IF statements when possible
Implement parallel processing for independent domains
Optimize sort operations using PROC SORT NODUPKEY

options mprint mlogic symbolgen;

%let parallel_domains = ae cm eg lb mh vs;

%macro process_domains;

%do i = 1 %to %sysfunc(countw(¶llel_domains));

%let domain = %scan(¶llel_domains, &i);

%submit;

%create_domain(domain=&domain)

%endsubmit;

%end;

%mend process_domains;

Best Practice: Always document your code thoroughly and include version control information for traceability.

Conclusion

Mastering these advanced SAS programming techniques can significantly improve the efficiency and quality of SDTM implementation. Remember to always validate your outputs against SDTM Implementation Guide requirements and maintain clear documentation of your programming decisions.