StudySAS Blog: Mastering Clinical Data Management with SAS

Welcome to StudySAS, your ultimate guide to clinical data management using SAS. We cover essential topics like SDTM, CDISC standards, and Define.XML, alongside advanced PROC SQL and SAS Macros techniques. Whether you're enhancing your programming efficiency or ensuring compliance with industry standards, StudySAS offers practical tips and insights to elevate your clinical research expertise. Join us and stay ahead in the evolving world of clinical data.

Study Start Date in SDTM – Why Getting It Right Matters

The Study Start Date (SSTDTC) is a crucial element in the submission of clinical trial data, especially in meeting regulatory requirements. Since December 2014, the FDA has provided explicit guidance on defining and utilizing this data point, but many sponsors and service providers face challenges in its consistent application. Missteps in defining the Study Start Date can lead to technical rejection during submission reviews, delaying the regulatory process. This article explores the definition, importance, and proper implementation of the Study Start Date in SDTM (Study Data Tabulation Model) submissions, based on regulatory guidance and best practices.

FDA’s Definition of Study Start Date

The FDA, in its 2014 guidance, clarified that the Study Start Date for clinical trials is the earliest date of informed consent for any subject enrolled in the study. This approach ensures a data-driven, verifiable measure, using clinical trial data captured in the Demographics (DM) domain of SDTM. By anchoring the Study Start Date to the earliest informed consent, the FDA avoids manipulation of study timelines and enforces adherence to accepted data standards during study execution.^[1]

Why is the Study Start Date Important?

The Study Start Date serves as a cornerstone in the clinical trial submission process. It plays a critical role in:

Compliance: The Study Start Date is used as a benchmark to assess the trial’s adherence to FDA’s data standards catalog. Standards implementation is tracked based on this reference date.
Consistency: A well-defined Study Start Date ensures uniformity across various study data elements, including SDTM domains, analysis datasets, and associated regulatory documentation.
Avoiding Rejections: Incorrect or inconsistent assignment of the Study Start Date can lead to technical rejection by the FDA during submission. Rule 1734, for instance, mandates that the Study Start Date be recorded in the Trial Summary (TS) domain, and failure to meet this criterion can result in a rejection of the submission package.^[1]

Where to Record the Study Start Date

Accurate recording of the Study Start Date is essential for successful regulatory submissions. This date must appear in two key sections of the eCTD (electronic Common Technical Document) that are submitted to the FDA:

SDTM TS Domain: The Study Start Date is stored in the variable TS.TSVAL where TSPARMCD = 'SSTDTC'. It must adhere to the ISO 8601 date format (YYYY-MM-DD).
Study Data Standardization Plan (SDSP): The Study Start Date is also included in the SDSP, which contains metadata about the study, along with other relevant details such as the use of FDA-endorsed data standards.^[1]

Challenges in Defining the Study Start Date

One major challenge in defining the Study Start Date arises from the varied interpretations among stakeholders. For example:

Data Managers may consider the go-live date of the data collection system as the Study Start Date.
Safety Teams might prioritize the first dose date of the investigational product.
Clinical Operations could focus on the date of the first patient visit at the clinic.

However, the correct interpretation, as per FDA guidance, is the earliest informed consent date, ensuring a consistent and regulatory-compliant approach.^[1]

How to Verify the Study Start Date

Verifying the Study Start Date requires careful examination of clinical data. The following steps can help:

Examine the DM Domain: The DM.RFICDTC variable (date of informed consent) is cross-checked against the protocol and Statistical Analysis Plan (SAP) to ensure it aligns with enrolled subjects.
Exclude Screen Failures: Screen failures must be excluded from the analysis as they do not contribute to the Study Start Date. Only enrolled subjects should be included in this determination.

Programmatic Check: The following SAS code can be used to programmatically select the earliest informed consent date for the enrolled subjects:

proc sql noprint;
   select min(rficdtc) from SDTM.DM
   where rficdtc is not null and armcd in ('TRTA', 'TRTB');
quit;

Global Considerations

While the FDA’s definition is clear, other regulatory authorities such as the PMDA in Japan and the NMPA in China have slightly different approaches. For example, the PMDA evaluates standards based on the submission date rather than the Study Start Date. As more global regulators adopt machine-readable clinical data standards, alignment with FDA guidance may serve as a reference point for future harmonization efforts.^[1]

Multiple Ways to Derive Stusy Start Date (SSTDTC) in SDTM

Several SAS-based methods can be used to derive the Study Start Date (SSTDTC) in SDTM. Below are a few approaches:

1. Using PROC SQL

PROC SQL can compute the minimum date directly from the dataset:

proc sql noprint;
    select min(rficdtc) into :mindate
    from SDTM.DM
    where rficdtc is not null and armcd ne 'SCRNFAIL';
quit;

2. Using DATA STEP with RETAIN

This approach retains the minimum date as the dataset is processed:

data earliest_consent;
    set SDTM.DM;
    where rficdtc is not missing and armcd ne 'SCRNFAIL';
    retain min_consent_date;
    if _N_ = 1 then min_consent_date = rficdtc;
    else if rficdtc < min_consent_date then min_consent_date = rficdtc;
run;

proc print data=earliest_consent(obs=1);
    var min_consent_date;
run;

3. Using PROC MEANS

PROC MEANS can be used to calculate the minimum date:

proc means data=SDTM.DM min noprint;
    var rficdtc;
    output out=min_consent_date(drop=_type_ _freq_) min=MinRFICDTC;
run;

proc print data=min_consent_date;
run;

4. Using PROC SORT and DATA STEP

This approach involves sorting the dataset and extracting the first record:

proc sort data=SDTM.DM out=sorted_dm;
    by rficdtc;
    where rficdtc is not missing and armcd ne 'SCRNFAIL';
run;

data min_consent_date;
    set sorted_dm;
    if _N_ = 1 then output;
run;

proc print data=min_consent_date;
    var rficdtc;
run;

Conclusion

Accurately defining and recording the Study Start Date is essential for compliance with FDA regulations and avoiding submission rejections. Ensuring consistency across study data and metadata is key to a successful clinical trial submission process. Various methods in SAS, including SQL-based and procedural approaches, offer flexible options for deriving the study start date (SSTDTC) in SDTM.

References

Best Practices for Joining Additional Columns into an Existing Table Using PROC SQL

When working with large datasets, it's common to add new columns from another table to an existing table using SQL. However, many programmers encounter the challenge of recursive referencing in PROC SQL when attempting to create a new table that references itself. This blog post discusses the best practices for adding columns to an existing table using PROC SQL and provides alternative methods that avoid inefficiencies.

1. The Common Approach and Its Pitfall

Here's a simplified example of a common approach to adding columns via a LEFT JOIN:

PROC SQL;
CREATE TABLE WORK.main_table AS
SELECT main.*, a.newcol1, a.newcol2
FROM WORK.main_table main
LEFT JOIN WORK.addl_data a
ON main.id = a.id;
QUIT;

While this approach might seem straightforward, it leads to a warning: "CREATE TABLE statement recursively references the target table". This happens because you're trying to reference the main_table both as the source and the target table in the same query. Furthermore, if you're dealing with large datasets, creating a new table might take up too much server space.

2. Best Practice 1: Use a Temporary Table

A better approach is to use a temporary table to store the joined result and then replace the original table. Here’s how you can implement this:

PROC SQL;
   CREATE TABLE work.temp_table AS 
   SELECT main.*, a.newcol1, a.newcol2
   FROM WORK.main_table main
   LEFT JOIN WORK.addl_data a
   ON main.id = a.id;
QUIT;

PROC SQL;
   DROP TABLE work.main_table;
   CREATE TABLE work.main_table AS 
   SELECT * FROM work.temp_table;
QUIT;

PROC SQL;
   DROP TABLE work.temp_table;
QUIT;

This ensures that the original table is updated with the new columns without violating the recursive referencing rule. It also minimizes space usage since the temp_table will be dropped after the operation.

3. Best Practice 2: Use `ALTER TABLE` for Adding Columns

If you're simply adding new columns without updating existing ones, you can use the ALTER TABLE statement in combination with UPDATE to populate the new columns:

PROC SQL;
   ALTER TABLE WORK.main_table
   ADD newcol1 NUM, 
       newcol2 NUM;

   UPDATE WORK.main_table main
   SET newcol1 = (SELECT a.newcol1 
                  FROM WORK.addl_data a 
                  WHERE main.id = a.id),
       newcol2 = (SELECT a.newcol2 
                  FROM WORK.addl_data a 
                  WHERE main.id = a.id);
QUIT;

This approach avoids creating a new table altogether, and instead modifies the structure of the existing table.

4. Best Practice 3: Consider the `DATA Step MERGE` for Large Datasets

For very large datasets, a DATA step MERGE can sometimes be more efficient than PROC SQL. The MERGE statement allows you to combine datasets based on a common key variable, as shown below:

PROC SORT DATA=WORK.main_table; BY id; RUN;
PROC SORT DATA=WORK.addl_data; BY id; RUN;

DATA WORK.main_table;
   MERGE WORK.main_table (IN=in1)
         WORK.addl_data (IN=in2);
   BY id;
   IF in1; /* Keep only records from main_table */
RUN;

While some might find the MERGE approach less intuitive, it can be a powerful tool for handling large tables when combined with proper sorting of the datasets.

Conclusion

The best method for joining additional columns into an existing table depends on your specific needs, including dataset size and available server space. Using a temporary table or the ALTER TABLE method can be more efficient in certain situations, while the DATA step MERGE is a reliable fallback for large datasets.

By following these best practices, you can avoid common pitfalls and improve the performance of your SQL queries in SAS.

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.

Discover More Tips and Techniques on This Blog

Study Start Date in SDTM – Why Getting It Right Matters

FDA’s Definition of Study Start Date

Why is the Study Start Date Important?

Where to Record the Study Start Date

Challenges in Defining the Study Start Date

How to Verify the Study Start Date

Global Considerations

Multiple Ways to Derive Stusy Start Date (SSTDTC) in SDTM

1. Using PROC SQL

2. Using DATA STEP with RETAIN

3. Using PROC MEANS

4. Using PROC SORT and DATA STEP

Conclusion

References

Best Practices for Joining Additional Columns into an Existing Table Using PROC SQL

1. The Common Approach and Its Pitfall

2. Best Practice 1: Use a Temporary Table

3. Best Practice 2: Use ALTER TABLE for Adding Columns

4. Best Practice 3: Consider the DATA Step MERGE for Large Datasets

Conclusion

Disclosure:

3. Best Practice 2: Use `ALTER TABLE` for Adding Columns

4. Best Practice 3: Consider the `DATA Step MERGE` for Large Datasets