Detecting Non-Printable Characters in SDTM Datasets Using SAS

Detecting Non-Printable Characters in SDTM Datasets Using SAS

Detecting Non-Printable Characters in SDTM Datasets Using SAS

Non-printable characters in datasets can lead to errors and inconsistencies, especially in the highly regulated environment of clinical trials. This blog post demonstrates how to create a SAS program that identifies non-printable characters in all SDTM datasets within a library and generates a comprehensive report.

Why Detect Non-Printable Characters?

Non-printable characters, such as ASCII values below 32 or above 126, can cause issues during data validation, regulatory submissions, and downstream processing. Detecting them early ensures the quality and compliance of your SDTM datasets.

The SAS Program

The following SAS program processes all SDTM datasets in a library and generates a combined report of non-printable characters, including:

  • Dataset name: The dataset where the issue exists.
  • Variable name: The variable containing non-printable characters.
  • Row number: The row where the non-printable character is found.
  • Positions: The exact position(s) of non-printable characters with their ASCII values.

Program Code

%macro find_nonprintable_all(libname=, output=);

    /* Get a list of all datasets in the library */
    proc sql noprint;
        select memname into :dsetlist separated by ' '
        from dictionary.tables
        where libname = upcase("&libname");
    quit;

    /* Create a combined report */
    data &output.;
        length _dataset $32 _varname $32 _rownum 8 _position_list $500;
        retain _dataset _varname _rownum _position_list;
        call missing(_dataset, _varname, _rownum, _position_list);
        stop;
    run;

    /* Loop through each dataset and find non-printable characters */
    %let count = %sysfunc(countw(&dsetlist));
    %do i = 1 %to &count;
        %let dset = %scan(&dsetlist, &i);

        data temp_report;
            set &libname..&dset.;
            length _dataset $32 _varname $32 _rownum 8 _position_list $500;
            retain _dataset _varname _rownum _position_list;
            _dataset = "&dset";
            array charvars {*} _character_; /* Select all character variables */

            do i = 1 to dim(charvars);
                _varname = vname(charvars[i]);
                _rownum = _n_;
                _position_list = '';

                /* Check each character in the variable */
                do j = 1 to length(charvars[i]);
                    charval = substr(charvars[i], j, 1);
                    ascii_val = rank(charval);

                    /* Flag non-printable characters */
                    if ascii_val < 32 or ascii_val > 126 then do;
                        _position_list = catx(', ', _position_list, 
                                              "Position=" || put(j, best.) || " (ASCII=" || ascii_val || ")");
                    end;
                end;

                /* Output if any non-printable characters are found */
                if not missing(_position_list) then output;
            end;

            drop i j charval ascii_val;
        run;

        /* Append to the combined report */
        proc append base=&output. data=temp_report force;
        run;

        /* Clean up temporary dataset */
        proc datasets lib=work nolist;
            delete temp_report;
        quit;
    %end;

%mend;

/* Example usage */
%find_nonprintable_all(libname=sdtm, output=nonprintable_combined_report);

/* Review the combined report */
proc print data=nonprintable_combined_report noobs;
    title "Non-Printable Characters Report for All Datasets in the Library";
run;
        

How It Works

The program processes each dataset in the specified library, examines all character variables for non-printable characters, and records their positions in a combined report.

Output

The final report contains the following columns:

  • _dataset: Name of the dataset.
  • _varname: Name of the variable.
  • _rownum: Row number.
  • _position_list: Details of non-printable character positions and ASCII values.

Conclusion

Using this SAS program, you can proactively identify and address non-printable characters in SDTM datasets, ensuring data integrity and compliance. Feel free to adapt this program for your specific needs.

Popular posts from this blog

SAS Interview Questions and Answers: CDISC, SDTM and ADAM etc

Comparing Two Methods for Removing Formats and Informats in SAS: DATA Step vs. PROC DATASETS

Studyday calculation ( --DY Variable in SDTM)