Saturday, August 31, 2024

>Automating Routine Email Reports in SAS: A Step-by-Step Guide

Automating Routine Email Reports in SAS: A Step-by-Step Guide

Introduction

In today’s fast-paced business environment, efficiency and automation are key to maintaining productivity. Routine reports are essential, but manually generating and distributing them can be time-consuming and prone to errors. Fortunately, SAS provides powerful tools to automate these tasks, allowing you to generate reports and automatically send them via email. This ensures stakeholders receive the information they need in a timely and consistent manner.

In this article, we'll walk through a practical example of how to automate the generation of a report and send it via email using SAS. We will cover everything from generating the report to configuring the email, making this a comprehensive guide that you can easily adapt to your own reporting needs.

Step 1: Generate the Report

The first step in our automation process is to generate the report that will be sent via email. In this example, we'll create a PDF report that summarizes car statistics from the built-in SAS dataset sashelp.cars. The Output Delivery System (ODS) in SAS allows us to output the report in a variety of formats; in this case, we'll use PDF.


/* Set the path where the report will be saved */
%let output_path = C:\Reports;

/* Generate the PDF report */
ods pdf file="&output_path./Monthly_Report.pdf" style=journal;
proc means data=sashelp.cars;
   var horsepower mpg_city mpg_highway;
   class type;
   title "Monthly Car Statistics Report";
run;
ods pdf close;

In this code:

  • We specify the output path where the report will be saved using the macro variable output_path.
  • We use the ODS PDF statement to create a PDF file named Monthly_Report.pdf in the specified path.
  • The PROC MEANS procedure generates summary statistics for horsepower, city miles per gallon (mpg_city), and highway miles per gallon (mpg_highway), grouped by the type of car.

Step 2: Send the Report via Email

Once the report is generated, the next step is to automate the process of sending it via email. SAS provides the FILENAME statement to create an email fileref, which we can then use to send the report as an attachment.


/* Configure the email settings */
filename mymail email
   to='recipient@example.com'
   subject="Monthly Car Statistics Report"
   attach="&output_path./Monthly_Report.pdf";

/* Send the email with the attached report */
data _null_;
   file mymail;
   put "Dear Team,";
   put "Please find attached the Monthly Car Statistics Report.";
   put "Best regards,";
   put "SAS Automation Team";
run;

/* Clear the email fileref */
filename mymail clear;

In this code:

  • The filename mymail email statement configures the email settings. You specify the recipient’s email address in the to= option, the subject of the email in the subject= option, and the path to the attached report in the attach= option.
  • The data _null_; step is used to write the body of the email. The file mymail; statement indicates that the content of the put statements should be sent to the email.
  • Finally, the filename mymail clear; statement clears the email fileref, releasing any resources it was using.

Conclusion

By following these steps, you can automate the generation and distribution of routine reports in SAS, saving time and reducing the potential for errors. This example illustrates how simple it can be to set up automated email reports, making it easier to ensure that your team receives the necessary data on time, every time.

This approach is highly adaptable and can be expanded to include more complex reports, multiple attachments, or even scheduled automation using job schedulers like CRON (on Linux systems) or Task Scheduler (on Windows). With SAS, you have the tools to streamline your reporting process, allowing you to focus on more critical tasks.

Additional Tips

  • Dynamic Email Content: You can further enhance this automation by making the email content dynamic, such as including the report date or summary statistics directly in the email body.
  • Multiple Recipients: If you need to send the report to multiple recipients, you can separate the email addresses with a comma in the to= option.
  • Email from a Different Address: If your SAS environment supports it, you can specify a different sender email address using the from= option in the filename statement.

Automating routine tasks like report generation and distribution not only saves time but also ensures consistency and accuracy in your reporting. By leveraging the capabilities of SAS, you can create a seamless workflow that keeps your team informed and up to date with minimal manual intervention.

Friday, August 30, 2024

10 Essential SAS Programming Tips for Boosting Your Efficiency

10 Essential SAS Programming Tips for Boosting Your Efficiency

As a SAS programmer, you're always looking for ways to streamline your code, improve efficiency, and enhance the readability of your programs. Whether you're new to SAS or a seasoned pro, these tips will help you optimize your workflows and make the most out of your programming efforts.

Here are ten essential SAS programming tips to elevate your coding skills:

  1. Harness the Power of PROC SQL for Efficient Data Manipulation
    PROC SQL can be a game-changer when it comes to handling complex data manipulations. It allows you to merge datasets, filter records, and create summary statistics all within a few lines of code, making your data processing more concise and effective.

        proc sql;
           select Name, mean(Salary) as Avg_Salary
           from employees
           group by Department
           having Avg_Salary > 50000;
        quit;
        
  2. Simplify Repetitive Tasks with ARRAY
    Repetitive calculations or transformations across multiple variables can clutter your code. Using an ARRAY simplifies these tasks, allowing you to apply changes to multiple variables in a structured and clean manner.

        data new_data;
           set original_data;
           array scores[5] score1-score5;
           do i = 1 to 5;
              scores[i] = scores[i] * 1.1;  /* Applying a 10% increase to all scores */
           end;
        run;
        
  3. Create Dynamic Macro Variables with CALL SYMPUT and CALL SYMPUTX
    Macro variables can make your SAS programs more flexible and reusable. CALL SYMPUT and CALL SYMPUTX allow you to create these variables dynamically during data steps, with CALL SYMPUTX offering the added benefit of trimming spaces.

        data _null_;
           set employees;
           call symputx('emp_count', _n_);
        run;
    
        %put &emp_count;
        
  4. Optimize Subsetting with WHERE Statements
    When subsetting data, WHERE statements are generally more efficient than IF statements. WHERE conditions filter data at the point of reading, which reduces the amount of data loaded into memory, speeding up processing times.

        data subset;
           set employees(where=(Salary > 50000));
        run;
        
  5. Streamline Data Recoding with PROC FORMAT
    PROC FORMAT is an incredibly versatile tool for recoding and grouping values. It enhances your data processing capabilities and improves code readability by allowing you to define and reuse custom formats.

        proc format;
           value salary_fmt
              low - 50000 = 'Low'
              50001 - 100000 = 'Medium'
              100001 - high = 'High';
        run;
    
        proc freq data=employees;
           tables Salary / format=salary_fmt.;
        run;
        
  6. Profile Your Data with PROC CONTENTS and PROC FREQ
    Before diving into analysis, it's crucial to understand the structure and distribution of your data. PROC CONTENTS gives you a detailed overview, while PROC FREQ provides insights into the distribution of categorical variables, helping you identify any data anomalies early on.

        proc contents data=employees; run;
    
        proc freq data=employees;
           tables Department / missing;
        run;
        
  7. Efficiently Manage Variables with KEEP and DROP Statements
    To enhance performance and reduce dataset sizes, selectively keep or drop variables during your data steps. This practice is especially useful when working with large datasets where memory efficiency is crucial.

        data smaller_set;
           set large_set(keep=Name Department Salary);
        run;
        
  8. Concatenate Datasets Seamlessly with PROC APPEND
    When you need to combine datasets, PROC APPEND is often more efficient than using multiple data steps. It appends one dataset to another without re-reading the original data, making it ideal for large datasets.

        proc append base=master_data data=new_data;
        run;
        
  9. Automate Repetitive Tasks with Macro Programming
    Macro programming can dramatically reduce the amount of repetitive code in your SAS programs. By creating macros for commonly used processes, you can maintain consistency and save time, especially when working with similar tasks across multiple datasets.

        %macro process_data(year);
           data processed_&year;
              set raw_data_&year;
              /* Processing steps */
           run;
        %mend process_data;
    
        %process_data(2023);
        %process_data(2024);
        
  10. Debug Efficiently Using SAS OPTIONS
    Debugging is an essential part of the development process. SAS provides several system options like OPTIONS MPRINT;, OPTIONS SYMBOLGEN;, and OPTIONS MLOGIC; that allow you to trace the execution of your code, resolve errors, and understand the values of macro variables.

        options mprint symbolgen mlogic;
        

Tuesday, April 28, 2015

Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions

Today, I stumbled upon a post where the author talks about a new options that are available in SAS 9.3 and later versions. These options (NOUNIQUEKEYS and UNIQUEOUT)  that allows sorting and then finding the duplicate records to be done in one step using PROC SORT.

Direct Link: 

Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions

Christopher J. Bost published a paper in SAS Global Forum 2013 regarding the same option.


Thursday, November 20, 2014

FDA's Official List of Validation Rules for SDTM compliance

Yesterday, FDA published its first official list of validation rules for CDISC SDTM. These long awaited rules cover both conformance and quality requirements, as described in the FDA Study Data Technical Conformance Guide. Conformance validation rules help ensure that the data conform to the standards, while quality checks help ensure the data support meaningful analysis.

For Official list of rules, here is the direct link for the FDA website: http://www.fda.gov/forindustry/datastandards/studydatastandards/default.htm

The FDA is asking sponsors to validate their study data before submission using these published validation rules and either correct any validation issues or explain, why they could not be corrected, in the Study Data Reviewer's Guide. This recommended pre-submission validation step is intended to minimize the presence of validation issues at the time of submission.

Open CDISC is offering a webinar on the official list of validation rules. They are offering 2 sessions, 1 is on Tuesday the December 2 and second one is on Wed, Dec 3.


FDA Official Validation Rules for Submission Data

Wednesday, December 3 at 2:00 PM EST


Click on the above links to register for the webinar.

Best,

Sarath

Thursday, November 14, 2013

How to avoid data set merging problems when common BY variable has different lengths?

When merging 2 datasets with a common by-variable and when that common variable has different variable length, the merge process produce unexpected results.  If you use SAS 9.2 version like me, then SAS Data step will inform you with the following warning:

WARNING: Multiple lengths were specified for the BY variable ****** by input data sets. This may cause unexpected results.

It is good that at least starting SAS 9.2 version, data step issues a Warning message to inform the programmer. But if you use before versions, it is difficult to notice this potential disaster.  

When you see this WARNING message in the SAS log, we might be inclined to ignore this warning because we think this is just a WARNING never realizing the potential danger. When you see this message in the LOG we should be thinking about this instead of avoiding because SAS will do exactly what it states: it may cause unexpected results. In some cases merge won’t even happen between datasets and sometimes the partial merge between the datasets.

Let’s look at the following example.
data table1;
          length visit $13; * LENGTH IS 13;
          visit = "CYCLE 1 DAY 1";
          visitnum = 1;
run;

data table2;
          length visit $14; * LENGTH IS 14;
          visit = "CYCLE 1 DAY 10";
          visitnum = 3;
run;

proc sort data=table1;      by visit;run;
proc sort data=table2;      by visit;run;

TABLE 1;
VISIT
VISITNUM
CYCLE 1 DAY 1
1

TABLE 2;
VISIT
VISITNUM
CYCLE 1 DAY 10
3

*Merge 2 datasets together with common BY variable with different lengths;
data table12;
          merge table1 table2;
          by visit;
run;

*Here is the LOG FILE;

2714
2715  data table1_2;
2716            merge table1 table2;
2717            by visit;
2718  run;

WARNING: Multiple lengths were specified for the BY variable VISIT by input data sets. This may cause unexpected results.
NOTE: There were 1 observations read from the data set WORK.TABLE1.
NOTE: There were 1 observations read from the data set WORK.TABLE2.
NOTE: The data set WORK.TABLE1_2 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

As a result of different lengths, SAS adds only one record to the output dataset rather than 2.

*WRONG OUTPUT CREATED;
VISIT
VISITNUM
CYCLE 1 DAY 1
3

*CORRECT OUTPUT SHOULD BE;
VISIT
VISITNUM
CYCLE 1 DAY 1
1
CYCLE 1 DAY 10
3


*To avoid this potential disaster, I’ve used Proc SQL and then created a macro variable with the maximum length of Visit variable in all the datasets in the WORK directory.;

proc sql;
          select max(length) into :maxlength
          from sashelp.vcolumn
          where libname='WORK'
          and name="VISIT";
quit;

*Visit length form TABLE1 is 13 and from TABLE2 is 14, so since I know the maximum length I will used that in here;

data table1_2;
          length visit $ &maxlength;
          merge table1 table2;
          by visit;
run;

*THIS RESULTS IN CORRECT OUTPUT;
VISIT
VISITNUM
CYCLE 1 DAY 1
1
CYCLE 1 DAY 10
3


Thursday, July 25, 2013

Basic Differences Between Proc MEANS and Proc SUMMARY

Though Proc Means and Proc Summary are 2 different procedures essentially used to compute descriptive statistics of numeric variables, but there are differences between these two. ( 
1)  By default, Proc MEANS  produces printed output in the LISTING window or other open destination whereas Proc SUMMARY does not.  
2) Proc Summary only produces the descriptive statistics for the variables that are specified in the VAR statement, where as Proc Means by default, computes the descriptive statistics of the numeric variables even without the VAR statement.
Here is a post which details the differences:
Excerpt
Proc SUMMARY and Proc MEANS are essentially the same procedure.  Both procedures compute descriptive statistics.  The main difference concerns the default type of output they produce.  Proc MEANS by default produces printed output in the LISTING window or other open destination whereas Proc SUMMARY does not.  Inclusion of the print option on the Proc SUMMARY statement will output results to the output window.
The second difference between the two procedures is reflected in the omission of the VAR statement.  When all variables in the data set are character the same output: a simple count of observations, is produced for each procedure.  However, when some variables in the dataset are numeric, Proc MEANS analyses all numeric variables not listed in any of the other statements and produces default statistics for these variables (N, Mean, Standard Deviation, Minimum and Maximum). 


Thursday, March 14, 2013

Exploring the Analysis Data Model – ADaM Datasets

Today, I stumbled upon a blog which is interesting and resourceful.  I liked the article so much so want to share with all my friends here.

Here is the direct link for the post to download or to review:
Actual Article:


The Analysis Data Model (ADaM) is a standard released by the Clinical Data Interchange Standards Consortium (CDISC) and has quickly become widely used in the submission of clinical trial information. ADaM has very close ties to another of CDISCs released standards, Study Data Tabulation Model (SDTM).
The main difference between these two CDISC standards is the way in which the data is displayed. SDTM provides a standard for the creation and mapping of collected data from Raw sources, where as ADAM provides a standard for the creation of analysis-ready data, often using SDTM data as the source.
The purpose of the analysis-ready ADaM data is to provide the programmer with a means to create tables, listings and figures with minimal time and effort whilst ensuring a clear level of traceability in the derived values. This is a key factor of ADaM data as there is a need for a clear and unambiguous flow from the study tabulation data to the analysis data which supports the statistical analyses performed in a clinical study.
CDISC state the following key principles for Analysis Datasets:
  • facilitate clear and unambiguous communication and provide a level of traceability 
  • be useable by currently available tools 
  • be linked to machine-readable metadata 
  • be analysis-ready
To perform statistical analysis on a study, data maybe required from many domains, such as labs, adverse events, demographics and subject characteristics. Bringing this data into ADaM datasets and performing any complex derivations required for display endpoints means that no further data manipulation is required to produce statistical outputs.

When creating the ADaM datasets the requirements of the analyses must be taken into consideration. This will ensure the desired numbers of datasets are produced – at the very least; a subject level dataset is required. Some of the data will be duplicated between domains, for example Age and Gender data. This is acceptable as this will aid the output creation or data review.

The naming convention for the datasets will follow “ADxxxx”, where the “xxxx” part will be sponsor-defined - AE for adverse events, LB for Laboratory results for example. The subject-level dataset, which will be discussed later, will be named “ADSL”. For the ADaM variables, the naming conventions should follow the standardized variable names defined in the ADaM Implementation Guide (ADaMIG). Any variables from the SDTM which are used directly in the ADaM dataset should keep the same variable name to avoid confusion. Sponsor-defined variable names will be given to any other analysis variables. Following these conventions will provide clarity for the reviewer.

As previously mentioned, a key requirement for ADaM data is a subject-level analysis dataset. This dataset and its documentation are always required – even if no other data is submitted. The subject-level dataset, or “ADSL” as it is named within ADaM conventions, contains a record for each subject with variables which display key information for subject disposition, demographic, and baseline characteristics. 

Other variables within ADSL will contain planned or actual treatment group information as well key dates and times of the subjects study participation on the study. Not all variables within ADSL may be used directly for analysis but could be used in conjunction with other datasets for display or grouping purposes or possibly included simply as variables of interest for review.

To conclude, the CDISC summary of ADSL is as follows: “The critical variables in ADSL will include those that are either descriptive, known to affect the subject’s response to drug (in terms of either efficacy or safety), used as strata for randomization, or identify the subject or event as belonging to specific subgroups (e.g. population flags). For example, subjects may be randomized after being stratified by age group because it is believed that younger subjects respond differently to the study drug. In this situation, a subject’s age category would be considered a critical variable for a study and included in ADSL.

I hope you guys liked it.

Sarath