Discover More Tips and Techniques on This Blog

Excluding Variables in PROC COMPARE Using the DROP Dataset Option

Excluding Variables in PROC COMPARE Using the DROP Dataset Option

When comparing two datasets using PROC COMPARE in SAS, there may be cases where you want to exclude specific variables from the comparison. One efficient way to do this is by using the DROP dataset option. This option allows you to remove certain variables from consideration during the comparison process.

Using the DROP Dataset Option in PROC COMPARE

The DROP dataset option is applied directly to the dataset references in the BASE and COMPARE options of PROC COMPARE. When you use DROP, the specified variables are excluded from the comparison.

Here is the syntax for using the DROP option:

proc compare base=dataset1(drop=var_to_exclude) 
                compare=dataset2(drop=var_to_exclude);
    run;

In this example, var_to_exclude is the name of the variable you want to exclude from both datasets (dataset1 and dataset2) before the comparison is made.

Example: Excluding a Variable from the Comparison

Let's say you have two datasets, sales_2023 and sales_2024, and you want to compare them, but you want to exclude a variable called region from the comparison. Here is how you can do that:

proc compare base=sales_2023(drop=region) 
                compare=sales_2024(drop=region);
    run;

This code ensures that the variable region will not be included in the comparison, while all other variables in the two datasets will be compared.

Excluding Multiple Variables

You can also exclude multiple variables by listing them inside the DROP option, separated by spaces:

proc compare base=sales_2023(drop=region quarter) 
                compare=sales_2024(drop=region quarter);
    run;

In this case, both region and quarter will be excluded from the comparison.

Conclusion

The DROP dataset option is a powerful and simple tool for excluding variables from comparison in PROC COMPARE. It provides flexibility by allowing you to exclude one or more variables without needing to manually specify which variables should be included in the comparison.

By leveraging the DROP option, you can ensure that only relevant variables are considered when comparing datasets, making your SAS programming more efficient and focused.

Mastering PROC SQL: Best Practices for Seamlessly Adding Columns

Mastering PROC SQL: Best Practices for Seamlessly Adding Columns

Mastering PROC SQL: Best Practices for Seamlessly Adding Columns

PROC SQL in SAS is a powerful tool, especially when you need to add new columns to existing tables. Whether you're updating tables with new data or making calculated columns, following best practices ensures your code is efficient, maintainable, and easy to understand.

1. Use ALTER TABLE to Add Columns

The most straightforward way to add a column to a table is by using the ALTER TABLE statement. It allows you to add one or more columns without disrupting the existing data structure.

proc sql;
   alter table your_table_name
   add new_column1 num format=8. label='New Column 1',
       new_column2 char(20) label='New Column 2';
quit;
Tip: Always specify the data type (num for numeric, char(n) for character) and add useful formats and labels to keep your columns clear and organized.

2. Avoid Duplicating Column Names

Before adding columns, check if a column with the same name already exists. SAS will throw an error if you try to add a column that’s already in the table.

proc sql;
   describe table your_table_name;
quit;

Use the DESCRIBE statement to view the structure and ensure no duplicate column names exist.

3. Efficient Data Insertion

After adding a column, you may need to initialize it with default values. You can do this in a single step using the UPDATE statement to ensure all rows are filled efficiently:

proc sql;
   alter table your_table_name
   add new_column num format=8. label='New Numeric Column';

   update your_table_name
   set new_column = 0; /* Default value */
quit;

4. Use CREATE TABLE for Complex Column Addition

If you need to add columns based on existing data or calculations, it's often better to create a new table. This prevents any unintentional changes to the original table and allows for cleaner manipulation of data:

proc sql;
   create table new_table as
   select existing_column,
          calculated(new_column) as new_column format=8. label='Calculated Column'
   from your_table_name;
quit;
Pro Tip: Use calculated to create columns based on other columns in the same query.

5. Consider Indexing for Performance

If the new column will be used in sorting or joins, adding an index can significantly boost performance. Here’s how to create an index on your new column:

proc sql;
   create index idx_new_column on your_table_name(new_column);
quit;

6. Use Comments for Clarity

Always document your SQL code! Adding comments helps future developers (or your future self) understand the purpose of each column addition:

proc sql;
   alter table your_table_name
   add new_column char(10) label='New Character Column'; /* Added for analysis purposes */
quit;

Final Thoughts

By following these best practices for adding columns via PROC SQL, you can keep your SAS programming efficient and clear. Whether you're maintaining legacy systems or building new reports, these tips will help you seamlessly integrate new data into your tables.

How to Address PROC COMPARE Reporting Same Values as Different in SAS

How to Address PROC COMPARE Reporting Same Values as Different in SAS

How to Address PROC COMPARE Reporting Same Values as Different in SAS

Working with large datasets in SAS often requires comparing data between two tables. The PROC COMPARE procedure is an essential tool for this task, but sometimes it reports values as different even when they appear to be identical. This issue can arise from various causes, such as numeric precision differences, rounding issues, or formatting inconsistencies. In this post, we will explore common causes of this issue and how to resolve them.

1. Numeric Precision Issues

SAS stores numeric values using floating-point precision, which can lead to small differences that aren't immediately visible. These differences may cause PROC COMPARE to report discrepancies even though the values seem the same.

Solution: Use the CRITERION or FUZZ option to define an acceptable tolerance for differences.


proc compare base=dataset1 compare=dataset2 criterion=0.00001;
run;
    

2. Rounding Differences

If values have been rounded differently in two datasets, PROC COMPARE may detect them as different. For example, one dataset may round to two decimal places, while the other doesn't.

Solution: Apply consistent rounding to both datasets before comparison.


data dataset1_rounded;
    set dataset1;
    value = round(value, 0.01); /* Round to two decimal places */
run;

data dataset2_rounded;
    set dataset2;
    value = round(value, 0.01); /* Same rounding precision */
run;

proc compare base=dataset1_rounded compare=dataset2_rounded;
run;
    

3. Formatting Differences

Sometimes, two values are the same but have different formats applied, leading to a perceived difference by PROC COMPARE.

Solution: Use the NOFORMAT option to ignore formatting in the comparison.


proc compare base=dataset1 compare=dataset2 noformat;
run;
    

4. Character Value Differences (Case Sensitivity and Whitespace)

SAS is case-sensitive when comparing character variables. Extra whitespace at the end of strings can also cause PROC COMPARE to flag a difference.

Solution: Standardize case and remove any unnecessary spaces using the COMPRESS or UPCASE functions.


data dataset1_clean;
    set dataset1;
    char_var = compress(upcase(char_var));
run;

data dataset2_clean;
    set dataset2;
    char_var = compress(upcase(char_var));
run;

proc compare base=dataset1_clean compare=dataset2_clean;
run;
    

5. Handling Different Variable Lengths

Character variables with different lengths may also trigger discrepancies in the comparison.

Solution: Ensure that corresponding variables have the same length in both datasets using LENGTH statements.

Conclusion

By addressing issues related to numeric precision, rounding, formatting, and character data, you can reduce or eliminate discrepancies reported by PROC COMPARE in SAS. These solutions ensure more accurate and meaningful comparisons between datasets.

Feel free to leave a comment if you have additional tips or if you’ve encountered other challenges with PROC COMPARE in SAS!

5 Approaches to Identify Blank Columns in SAS Datasets

5 Approaches to Identify Blank Columns in SAS Datasets

5 Approaches to Identify Blank Columns in SAS Datasets

Author: Sarath

Date: September 5, 2024

When working with large datasets in SAS, it’s crucial to ensure the integrity of your data. One common task is identifying columns that are completely blank, meaning they contain no non-missing values. In this article, we'll explore five different methods to efficiently find blank columns in SAS datasets, along with examples for each approach.

1. Using PROC MEANS or PROC SUMMARY

The simplest way to identify blank columns is by using PROC MEANS or PROC SUMMARY. These procedures provide a summary of missing values for both numeric and character variables. You can quickly determine which columns are fully blank by checking the output for variables with 100% missing values.


proc means data=your_dataset nmiss;
    var _numeric_;  /* For numeric variables */
run;

proc means data=your_dataset nmiss;
    var _character_;  /* For character variables */
run;
    

This method is effective for both numeric and character variables and gives a concise summary of missing values.

2. Using PROC FREQ

Another method is using PROC FREQ to check the frequency of missing values. This approach is suitable when you want to visually inspect the distribution of missing and non-missing values for each column.


proc freq data=your_dataset;
    tables _all_ / missing;
run;
    

In the output, look for variables with 100% missing values, which indicates that the column is blank.

3. DATA STEP Approach with Array

This approach involves using a DATA STEP with arrays to iterate over variables and check for missing values. It's a bit more hands-on and allows for customization, such as flagging rows based on certain conditions.


data check_blank;
    set your_dataset;
    array num_vars _numeric_;  /* For numeric variables */
    array char_vars _character_;  /* For character variables */
    all_blank = 1;  /* Initialize flag */

    do over num_vars;
        if num_vars ^= . then all_blank = 0;
    end;
    
    do over char_vars;
        if char_vars ^= '' then all_blank = 0;
    end;

    if all_blank = 1 then output;
run;
    

This method is useful if you want to customize the logic based on specific variables or values.

4. Using PROC SQL

With PROC SQL, you can query the metadata to identify columns that are fully missing. This approach gives you control over how you want to process the variables and provides a more SQL-like syntax for database programmers.


proc sql;
    select name
    from dictionary.columns
    where libname='WORK' and memname='YOUR_DATASET'
      and missing(name);
quit;
    

This is a quick and efficient way to find completely blank columns using SQL queries.

5. Using a Macro for Automation

If you need to automate the process of checking blank columns across multiple datasets or variables, writing a macro is the best solution. This macro checks each variable for missing values and flags them accordingly.


%macro find_blank_columns(dataset);
    proc contents data=&dataset out=vars(keep=name type) noprint; run;

    data _null_;
        set vars;
        call symputx('var'||left(_n_), name);
        call symputx('type'||left(_n_), type);
        call symputx('nvars', _n_);
    run;

    data check_blank;
        set &dataset;
        %do i = 1 %to &nvars;
            %if &&type&i = 1 %then %do;
                if &&var&i ne . then blank_flag = 0;
            %end;
            %else %do;
                if &&var&i ne '' then blank_flag = 0;
            %end;
        %end;
    run;
%mend find_blank_columns;

%find_blank_columns(your_dataset);
    

This macro can easily be modified to suit different datasets and reporting requirements, providing flexibility for more advanced users.

Conclusion

In this article, we've covered five different approaches to identifying completely blank columns in SAS datasets. Whether you prefer PROC MEANS, PROC FREQ, PROC SQL, or a customized DATA STEP solution, there’s a method here to suit your needs. For users working with large and complex datasets, automating this process using a macro can save time and ensure data integrity.

Explore these techniques and adapt them to your specific use case to streamline your data validation process in SAS.

Happy coding!

Effortlessly Upcase All Variables in SAS Using PROC DATASETS

Effortlessly Upcase All Variables in SAS Using PROC DATASETS

Effortlessly Upcase All Variables in SAS Using PROC DATASETS

When working with SAS datasets, ensuring consistency across variables, especially character variables, can be crucial. A common requirement is to upcase all character variables, converting their values to uppercase. While several methods exist to achieve this, one of the most efficient and dynamic approaches involves using the PROC DATASETS procedure. In this article, we will delve into how PROC DATASETS works and how you can use it to upcase all character variables in your dataset with minimal effort.

Understanding PROC DATASETS

The PROC DATASETS procedure is primarily used for managing SAS datasets within a library. It allows you to rename, delete, append, modify, and more, without the need to read or copy the data into the Program Data Vector (PDV). This makes it highly efficient, especially when you need to modify dataset attributes without touching the data itself.

For our specific task of upcasing variables, PROC DATASETS is useful because it allows us to apply a format to all character variables at once, without having to manually iterate over each variable.

Step-by-Step: How to Upcase All Character Variables

1. Identify the Dataset

The first step is to identify the dataset that you want to modify. This dataset should already exist in your specified library. For this example, let’s assume our dataset is original_dataset located in the WORK library.

2. Use the MODIFY Statement in PROC DATASETS

To modify a dataset without reading its data, you can use the MODIFY statement inside PROC DATASETS. This step will tell SAS which dataset you want to apply changes to.

3. Apply an Uppercase Format to All Character Variables

Now, the magic of PROC DATASETS lies in its ability to apply formats to variable types globally. By using the FORMAT statement with _character_, you can apply the $upcase. format to every character variable in the dataset.

Complete Code Example

Here is the full SAS code that applies $upcase. format to all character variables:


proc datasets lib=work nolist;
    modify original_dataset;
    format _character_ $upcase.;
run;
quit;
    

Explanation of the Code

  • lib=work: Specifies the library where the dataset is located (in this case, the WORK library).
  • nolist: Suppresses the listing of datasets being modified, keeping the log cleaner.
  • modify original_dataset: Indicates that we want to modify the dataset named original_dataset.
  • format _character_ $upcase.: Applies the $upcase. format to all character variables. This automatically converts the contents of these variables to uppercase.
  • run; and quit;: These statements execute the procedure and exit PROC DATASETS.

Advantages of Using PROC DATASETS for Upcasing

There are several advantages to using PROC DATASETS for upcasing character variables:

  • Efficiency: PROC DATASETS modifies the dataset in place without reading the data, making it faster and more efficient, especially for large datasets.
  • Dynamic Application: By using the _character_ keyword, you don’t need to list out each variable manually. It dynamically selects all character variables and applies the upcase format.
  • Minimal Code: Compared to other methods like loops or arrays, PROC DATASETS requires very little code to achieve the same result.
  • Works on Multiple Datasets: You can easily modify the code to loop through multiple datasets if needed, by adding more modify statements or using a macro.

Conclusion

Upcasing all character variables in a SAS dataset can be achieved in many ways, but PROC DATASETS offers a streamlined, efficient, and elegant solution. Whether you're dealing with large datasets or want to avoid manually specifying each variable, this method will save you time and effort. Next time you need to perform this task, give PROC DATASETS a try and enjoy its simplicity.

If you have any questions or would like further clarification on using PROC DATASETS in SAS, feel free to leave a comment below!

Mastering SAS: 6 Ways to Upcase All Variables in Your Dataset

How to Upcase All Variables in a SAS Dataset

How to Upcase All Variables in a SAS Dataset

When working with character data in SAS, you may often need to ensure that all text variables are in uppercase. Fortunately, SAS provides several methods to upcase variables in a dataset, depending on your specific needs. In this article, we will explore various ways to achieve this task and provide code snippets for each method.

1. Using DATA Step with UPCASE Function

The UPCASE function is the simplest way to convert all character variables to uppercase using a DATA step.


data upcase_dataset;
    set original_dataset;
    array char_vars _character_; 
    do i = 1 to dim(char_vars);
        char_vars[i] = upcase(char_vars[i]);
    end;
    drop i;
run;
    

This approach loops through all character variables and applies the UPCASE function to each of them.

2. Using PROC SQL with CASE Statement

In this method, PROC SQL is used with an explicit UPCASE call for each character variable. While this approach is less dynamic, it works well for small datasets.


proc sql;
    create table upcase_dataset as
    select upcase(var1) as var1,
           upcase(var2) as var2,
           upcase(var3) as var3
    from original_dataset;
quit;
    

3. Using PROC DATASETS and FORMAT

With PROC DATASETS, you can apply an uppercase format to all character variables in the dataset in one go.


proc datasets lib=work nolist;
    modify original_dataset;
    format _character_ $upcase.;
run;
quit;
    

4. Using a Macro with PROC SQL

This dynamic method uses a macro to automatically identify and upcase all character variables in the dataset. This is useful for larger datasets or when the number of character variables is unknown.


%macro upcase_all_vars(ds);
    proc sql noprint;
        select name into :char_vars separated by ' '
        from dictionary.columns
        where libname='WORK' and memname=upcase("&ds") and type='char';
    quit;

    data upcase_dataset;
        set &ds;
        %let count = %sysfunc(countw(&char_vars));
        %do i = 1 %to &count;
            %let var = %scan(&char_vars, &i);
            &var = upcase(&var);
        %end;
    run;
%mend;

%upcase_all_vars(original_dataset);
    

5. Using PROC FCMP

This approach involves creating a custom function using PROC FCMP and then applying it across all character variables.


proc fcmp outlib=work.functions.dataset_utils;
    function upcase_all(var $) $;
        return (upcase(var));
    endsub;
run;

options cmplib=work.functions;

data upcase_dataset;
    set original_dataset;
    array char_vars _character_;
    do i = 1 to dim(char_vars);
        char_vars[i] = upcase_all(char_vars[i]);
    end;
    drop i;
run;
    

6. Using HASH Object

An advanced approach is to use the SAS HASH object, which efficiently handles the upcase operation, especially when dealing with large datasets.


data upcase_dataset;
    if _n_ = 1 then do;
        declare hash h(dataset: 'original_dataset');
        h.defineKey('_N_');
        h.defineData('_ALL_');
        h.defineDone();
    end;

    set original_dataset;
    array char_vars _character_;
    do i = 1 to dim(char_vars);
        char_vars[i] = upcase(char_vars[i]);
    end;
    drop i;
run;
    

Conclusion

Each of the methods discussed above has its own strengths, depending on the size of the dataset and whether the variables are known in advance. Whether you're working with small or large datasets, SAS offers a variety of ways to upcase all variables. By selecting the most appropriate approach, you can simplify your workflow and ensure consistency across your data.

If you have any questions or need further assistance, feel free to leave a comment below!

Dynamic Macro Creation in SAS: Enhancing Automation and Flexibility

Dynamic Macro Creation in SAS: Enhancing Automation and Flexibility

Dynamic macro creation is a powerful technique in SAS that allows you to generate macro variables and macros based on data content or logic at runtime. This not only simplifies repetitive tasks but also provides a way to dynamically control program flow. In this article, we’ll cover various scenarios and provide multiple examples where dynamic macro creation can be beneficial.

Why Use Dynamic Macros?

  • Automation: Automate repetitive processes by generating macro variables based on dataset values.
  • Flexibility: Dynamic macros adjust based on the changing content of your datasets.
  • Efficient Code: Using dynamic macros reduces redundancy and ensures that your code adapts to different data structures without manual intervention.

Scenario 1: Generating Macros Based on Dataset Variables

Imagine a scenario where you have a dataset and need to dynamically create macros to store variable names or their values. This is useful for automating variable processing tasks, such as generating reports, manipulating data, or performing analyses.

Example 1: Creating Macros for Variable Names

/* Create macros for each unique variable in the dataset */
proc sql;
   select distinct name
   into :var1-:varN
   from sashelp.class;
quit;

%macro display_vars;
   %do i=1 %to &sqlobs;
      %put &&var&i;
   %end;
%mend display_vars;

%display_vars;

    

Explanation: This code dynamically selects variable names from the sashelp.class dataset and stores them in macro variables. The macro display_vars prints out each variable name, allowing flexible processing of variables without knowing them in advance.

Scenario 2: Automating Data Processing Based on Unique Values

Let’s say you have multiple categories or groups within your data, and you need to run a set of analyses or create reports for each unique group. You can dynamically generate macros for each category, making the process scalable and automated.

Example 2: Creating Macros for Unique Categories

/* Create macros for each unique 'sex' category in the dataset */
proc sql;
   select distinct sex
   into :sex1-:sexN
   from sashelp.class;
quit;

%macro analyze_sex;
   %do i=1 %to &sqlobs;
      proc print data=sashelp.class;
         where sex = "&&sex&i";
         title "Listing for Sex = &&sex&i";
      run;
   %end;
%mend analyze_sex;

%analyze_sex;

    

Explanation: This example dynamically creates a macro for each unique sex value and applies a filter in PROC PRINT for each group, generating reports for each distinct value in the dataset.

Scenario 3: Dynamically Generating Conditional Code

In some cases, you need to execute different code based on the values of certain variables or the content of a dataset. Dynamic macro creation helps generate conditional logic on the fly.

Example 3: Conditional Code Generation Based on Data Content

/* Identify numeric variables in the dataset and generate macro code */
proc sql;
   select name
   into :numvar1-:numvarN
   from dictionary.columns
   where libname='SASHELP' and memname='CLASS' and type='num';
quit;

%macro analyze_numeric_vars;
   %do i=1 %to &sqlobs;
      proc means data=sashelp.class;
         var &&numvar&i;
         title "Analysis of &&numvar&i";
      run;
   %end;
%mend analyze_numeric_vars;

%analyze_numeric_vars;

    

Explanation: This code identifies numeric variables in the dataset and dynamically creates macro variables for each. The macro analyze_numeric_vars runs PROC MEANS for each numeric variable, adjusting to any changes in the dataset structure.

Scenario 4: Dynamic Report Generation

Dynamic macros are helpful in generating dynamic reports or exporting data where the structure or content changes frequently. You can use dynamic macros to control file names, report titles, or export paths.

Example 4: Dynamic Report Titles

/* Generate macros for each unique 'name' value and create reports */
proc sql;
   select distinct name
   into :name1-:nameN
   from sashelp.class;
quit;

%macro create_reports;
   %do i=1 %to &sqlobs;
      proc print data=sashelp.class;
         where name = "&&name&i";
         title "Report for &&name&i";
      run;
   %end;
%mend create_reports;

%create_reports;

    

Explanation: This code dynamically creates a report for each unique name in the dataset, with the report’s title reflecting the name being processed. The code adapts to changes in the dataset, automating the report generation process.

Best Practices for Dynamic Macro Creation

  • Use PROC SQL with INTO Clauses: This is the most efficient way to generate dynamic macro variables from dataset content.
  • Limit the Number of Macros: Ensure that you don’t exceed the macro variable limit by limiting the number of macros generated.
  • Use &sqlobs: The &sqlobs macro variable is useful for counting the number of records or unique values, ensuring the loop runs the correct number of times.
  • Avoid Hardcoding: Whenever possible, rely on dynamic macros instead of hardcoding variable names or values to make your code flexible and adaptable.
  • Error Handling: Implement error handling and checks to ensure that dynamic macros are generated correctly without issues during execution.

Conclusion

Dynamic macro creation in SAS provides a robust and flexible way to automate repetitive tasks, process data efficiently, and adjust code dynamically based on dataset content. By generating macros based on variables or values within a dataset, you can create dynamic, scalable solutions for various SAS programming challenges.

Comparing VISIT and VISITNUM Values Across SDTM Datasets and the TV Domain

Extracting and Comparing Unique VISIT and VISITNUM Values from SDTM Datasets

Author: [Your Name]

Date: [Creation Date]

In clinical trials, the VISIT and VISITNUM variables are key identifiers for subject visits. Ensuring that all datasets have consistent visit data and that it aligns with the planned visits recorded in the TV (Trial Visits) domain is crucial for accurate data analysis. This post presents a SAS macro that automates the extraction of unique VISIT and VISITNUM values across all SDTM datasets in a library and compares them to those found in the TV domain.

Program Overview

The SAS macro program:

  • Extracts unique VISIT and VISITNUM values from all SDTM datasets in the specified library.
  • Compares these values against those recorded in the TV domain.
  • Highlights any discrepancies between the SDTM datasets and the TV domain.

Macro Code

Here’s the macro that performs the task:

%macro compare_visit(libname=);

    /* Step 1: Get the unique VISIT and VISITNUM values from the TV domain */
    proc sql;
        create table tv_visit as
        select distinct VISIT, VISITNUM 
        from &libname..TV
        where VISIT is not missing and VISITNUM is not missing;
    quit;

    /* Step 2: Get the list of datasets in the library containing both VISIT and VISITNUM */
    proc sql noprint;
        select memname 
        into :dslist separated by ' '
        from sashelp.vcolumn
        where libname = upcase("&libname")
          and name in ('VISIT', 'VISITNUM')
        group by memname
        having count(distinct name) = 2; /* Ensure both VISIT and VISITNUM are present */
    quit;

    /* Step 3: Check if any datasets were found */
    %if &sqlobs = 0 %then %do;
        %put No datasets in &libname contain both VISIT and VISITNUM variables.;
    %end;
    %else %do;
        %put The following datasets contain both VISIT and VISITNUM variables: &dslist;

        /* Initialize an empty dataset for combined VISIT and VISITNUM values */
        data combined_visits;
            length Dataset $32 VISIT $200 VISITNUM 8;
            stop;
        run;

        /* Step 4: Loop through each dataset */
        %let ds_count = %sysfunc(countw(&dslist));
        %do i = 1 %to &ds_count;
            %let dsname = %scan(&dslist, &i);

            /* Extract unique VISIT and VISITNUM values, excluding UNSCHEDULED visits */
            proc sql;
                create table visit_&dsname as
                select distinct "&&dsname" as Dataset, VISIT, VISITNUM
                from &libname..&&dsname
                where VISIT is not missing and VISITNUM is not missing
                  and VISIT not like 'UNSCH%'; /* Exclude UNSCHEDULED visits */
            quit;

            /* Append to the combined dataset */
            proc append base=combined_visits data=visit_&dsname force;
            run;
        %end;

        /* Step 5: Compare combined VISIT/VISITNUM with TV domain */
        proc sql;
            create table visit_comparison as
            select a.*, b.Dataset as In_SDTC_Dataset
            from tv_visit a
            left join combined_visits b
            on a.VISIT = b.VISIT and a.VISITNUM = b.VISITNUM
            order by VISITNUM, VISIT;
        quit;

        /* Step 6: Display the comparison results */
        proc print data=visit_comparison;
        title "Comparison of VISIT/VISITNUM between TV and SDTM Datasets (Excluding Unscheduled Visits)";
        run;
    %end;

%mend compare_visit;

/* Run the macro by specifying your SDTM library name */
%compare_visit(libname=sdtm);

How the Macro Works

This macro performs the following steps:

  1. It first extracts all unique VISIT and VISITNUM values from the TV domain.
  2. It then identifies all datasets in the specified library that contain the VISIT and VISITNUM variables by querying the metadata table SASHELP.VCOLUMN.
  3. For each identified dataset, the macro extracts the distinct VISIT and VISITNUM values and appends them into a consolidated dataset.
  4. Finally, it compares the combined results from the SDTM datasets against the values in the TVff domain and displays any discrepancies.

Use Case

This macro is especially useful when checking that all actual visits recorded in the SDTM datasets align with the planned visits documented in the TV domain. Ensuring consistency between these values is essential for accurate clinical trial reporting and analysis.

Example of Use:

%compare_visit(libname=sdtm);

In this example, the macro will search for VISIT and VISITNUM variables in the SDTM datasets located in the sdtm library and compare them with the values in the TV domain.

Conclusion

By automating the process of extracting and comparing VISIT and VISITNUM values, this macro simplifies what could otherwise be a tedious and error-prone task. It ensures that all visit data is consistent and complete, aligning the planned and actual visits in the SDTM datasets.

Feel free to adapt this macro to meet your specific needs in clinical trials data management!

Finding EPOCH Values in SDTM Datasets using a SAS Macro

Finding EPOCH Values in SDTM Datasets using a SAS Macro

Author: [Sarath]

Date: [05SEP2024]

The EPOCH variable is essential in many SDTM datasets as it helps describe the period during which an event, observation, or assessment occurs. In clinical trials, correctly capturing and analyzing the EPOCH variable across datasets is crucial. This post walks through a SAS macro program that automates the process of finding all EPOCH values from any dataset within an entire library of SDTM datasets.

Program Overview

This macro program loops through all the datasets in a specified library, checks for the presence of the EPOCH variable, and extracts the unique values of EPOCH from each dataset. It then consolidates the results and displays them for review.

Key Features:

  • Automatically identifies SDTM datasets containing the EPOCH variable.
  • Extracts unique values from the EPOCH variable for each dataset.
  • Combines results into a single dataset for ease of review.

Macro Code

Here’s the macro that performs the task:

%macro find_epoch(libname=);

    /* Get a list of all datasets in the library */
    proc sql noprint;
        select memname
        into :dslist separated by ' '
        from sashelp.vcolumn
        where libname = upcase("&libname")
          and name = 'EPOCH';
    quit;

    /* Check if any dataset contains the EPOCH variable */
    %if &sqlobs = 0 %then %do;
        %put No datasets in &libname contain the variable EPOCH.;
    %end;
    %else %do;
        %put The following datasets contain the EPOCH variable: &dslist;

        /* Loop through each dataset and extract unique EPOCH values */
        %let ds_count = %sysfunc(countw(&dslist));
        %do i = 1 %to &ds_count;
            %let dsname = %scan(&dslist, &i);
            
            /* Extract unique values of EPOCH */
            proc sql;
                create table epoch_&dsname as
                select distinct '&&dsname' as Dataset, EPOCH
                from &libname..&&dsname
                where EPOCH is not missing;
            quit;
        %end;

        /* Combine the results from all datasets */
        data all_epochs;
            set epoch_:;
        run;

        /* Display the results */
        proc print data=all_epochs;
        title "Unique EPOCH values across datasets in &libname";
        run;
    %end;

%mend find_epoch;

/* Run the macro by specifying your SDTM library name */
%find_epoch(libname=sdtm);

How the Macro Works

The macro works by querying the SASHELP.VCOLUMN metadata table to check for the presence of the EPOCH variable in any dataset. It loops through the datasets that contain the variable, extracts distinct values, and aggregates the results into a single dataset.

Steps:

  1. Identifies all datasets in the specified library.
  2. Checks each dataset for the EPOCH variable.
  3. For datasets containing EPOCH, it extracts unique values.
  4. Combines the unique values from all datasets into one result dataset.

Use Case

Imagine you have a large collection of SDTM datasets and need to quickly check which datasets contain the EPOCH variable and what unique values it holds. Running this macro allows you to do this across your entire library with minimal effort.

Example of Use:

%find_epoch(libname=sdtm);

In this example, the macro will search for the EPOCH variable in the SDTM datasets stored in the library named SDTM. It will then display the unique values of EPOCH found in those datasets.

Conclusion

This macro simplifies the task of analyzing the EPOCH variable across multiple datasets in a library, saving time and reducing manual effort. By leveraging the power of PROC SQL and macros, you can automate this otherwise tedious process.

Feel free to adapt and expand this macro to suit your specific needs! Happy coding!

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.