Saturday, August 31, 2024

Enhancing SAS Code Readability and Debugging with PUTLOG

Enhancing SAS Code Readability and Debugging with PUTLOG

Introduction

Writing clean and efficient code is crucial in SAS programming, especially when dealing with large datasets and complex data manipulations. However, even the most seasoned SAS programmers encounter issues that require debugging. While SAS offers various tools for identifying and resolving errors, one of the most effective yet often underutilized techniques is the use of the PUTLOG statement.

The PUTLOG statement provides a simple but powerful way to track the flow of your program and monitor the values of variables during execution. This article will explore how to use PUTLOG to enhance code readability, facilitate debugging, and ensure that your SAS programs run smoothly and correctly.

Understanding PUTLOG

The PUTLOG statement is similar to the PUT statement but is specifically designed to write messages to the SAS log. It is especially useful for debugging because it allows you to insert custom messages into the log that can include variable values, execution flow indicators, and error messages.


data example;
   set sashelp.class;
   if age > 14 then do;
      putlog "Age is greater than 14: " name= age=;
   end;
run;

In this example, the PUTLOG statement writes a custom message to the log whenever the condition (age > 14) is met. The log will show the name of the student and their age, making it easy to verify that the condition is being correctly identified.

Benefits of Using PUTLOG

1. Improving Code Readability

By inserting PUTLOG statements strategically throughout your code, you can create a more readable and maintainable program. For example, you can mark the start and end of significant processing steps or highlight key variable values at critical points in the execution.


data summary;
   set sashelp.class;
   putlog "Processing record: " _n_= name= age=;
   if age > 14 then group = 'Teen';
   else group = 'Child';
   putlog "Group assigned: " group=;
run;

This approach not only helps during debugging but also makes it easier for others (or yourself) to understand the logic when revisiting the code later.

2. Monitoring Execution Flow

In complex programs, it can be challenging to track the flow of execution, especially when there are multiple conditional statements or loops. PUTLOG can be used to monitor which parts of your code are being executed.


data check_flow;
   set sashelp.class;
   if age > 14 then do;
      putlog "Executing teen group assignment for " name= age=;
      group = 'Teen';
   end;
   else do;
      putlog "Executing child group assignment for " name= age=;
      group = 'Child';
   end;
run;

By including PUTLOG statements within each branch of your conditional logic, you can verify that the correct paths are being followed based on your data.

3. Identifying and Resolving Errors

PUTLOG can be particularly useful for identifying and diagnosing errors in your SAS programs. For example, you can insert PUTLOG statements to check the values of key variables before they are used in calculations or to confirm that data is being processed as expected.


data error_check;
   set sashelp.class;
   putlog "Checking age before calculation: " name= age=;
   if age <= 0 then do;
      putlog "ERROR: Invalid age value detected for " name= age=;
      error_flag = 1;
   end;
   else do;
      bmi = weight / (height * height);
      putlog "BMI calculated: " bmi=;
   end;
run;

In this example, PUTLOG is used to check for invalid age values and to confirm that the BMI calculation is performed correctly. If an error is detected, an appropriate message is written to the log, making it easier to trace the issue back to its source.

Advanced PUTLOG Techniques

1. Customizing Log Messages

You can enhance your log messages by including custom text, variable values, and even conditional formatting to highlight specific issues.


data custom_log;
   set sashelp.class;
   if age > 14 then do;
      putlog "NOTE: Teenager detected - " name= age=;
   end;
   else do;
      putlog "INFO: Child detected - " name= age=;
   end;
run;

2. Using Conditional PUTLOG Statements

Sometimes, you may want to conditionally execute PUTLOG statements based on the value of a variable or a specific condition. This can be achieved by wrapping PUTLOG within an IF statement.


data conditional_log;
   set sashelp.class;
   if age > 14 then putlog "Teenager: " name= age=;
   else putlog "Child: " name= age=;
run;

3. Combining PUTLOG with Other Debugging Techniques

PUTLOG can be combined with other SAS debugging techniques, such as using the DEBUG option in PROC SQL or employing OPTIONS like MLOGIC, MPRINT, and SYMBOLGEN for macro debugging.

Conclusion

The PUTLOG statement is a simple yet powerful tool for improving code readability and facilitating debugging in SAS. By strategically placing PUTLOG statements in your code, you can gain better insight into your program’s execution flow, monitor variable values, and quickly identify and resolve errors. Whether you're dealing with simple data steps or complex data manipulations, PUTLOG can help you write more robust and maintainable SAS programs.

Incorporating PUTLOG into your programming practice can save you time and frustration, making it an essential technique for any SAS programmer looking to enhance their coding efficiency and effectiveness.

>Automating Routine Email Reports in SAS: A Step-by-Step Guide

Automating Routine Email Reports in SAS: A Step-by-Step Guide

Introduction

In today’s fast-paced business environment, efficiency and automation are key to maintaining productivity. Routine reports are essential, but manually generating and distributing them can be time-consuming and prone to errors. Fortunately, SAS provides powerful tools to automate these tasks, allowing you to generate reports and automatically send them via email. This ensures stakeholders receive the information they need in a timely and consistent manner.

In this article, we'll walk through a practical example of how to automate the generation of a report and send it via email using SAS. We will cover everything from generating the report to configuring the email, making this a comprehensive guide that you can easily adapt to your own reporting needs.

Step 1: Generate the Report

The first step in our automation process is to generate the report that will be sent via email. In this example, we'll create a PDF report that summarizes car statistics from the built-in SAS dataset sashelp.cars. The Output Delivery System (ODS) in SAS allows us to output the report in a variety of formats; in this case, we'll use PDF.


/* Set the path where the report will be saved */
%let output_path = C:\Reports;

/* Generate the PDF report */
ods pdf file="&output_path./Monthly_Report.pdf" style=journal;
proc means data=sashelp.cars;
   var horsepower mpg_city mpg_highway;
   class type;
   title "Monthly Car Statistics Report";
run;
ods pdf close;

In this code:

  • We specify the output path where the report will be saved using the macro variable output_path.
  • We use the ODS PDF statement to create a PDF file named Monthly_Report.pdf in the specified path.
  • The PROC MEANS procedure generates summary statistics for horsepower, city miles per gallon (mpg_city), and highway miles per gallon (mpg_highway), grouped by the type of car.

Step 2: Send the Report via Email

Once the report is generated, the next step is to automate the process of sending it via email. SAS provides the FILENAME statement to create an email fileref, which we can then use to send the report as an attachment.


/* Configure the email settings */
filename mymail email
   to='recipient@example.com'
   subject="Monthly Car Statistics Report"
   attach="&output_path./Monthly_Report.pdf";

/* Send the email with the attached report */
data _null_;
   file mymail;
   put "Dear Team,";
   put "Please find attached the Monthly Car Statistics Report.";
   put "Best regards,";
   put "SAS Automation Team";
run;

/* Clear the email fileref */
filename mymail clear;

In this code:

  • The filename mymail email statement configures the email settings. You specify the recipient’s email address in the to= option, the subject of the email in the subject= option, and the path to the attached report in the attach= option.
  • The data _null_; step is used to write the body of the email. The file mymail; statement indicates that the content of the put statements should be sent to the email.
  • Finally, the filename mymail clear; statement clears the email fileref, releasing any resources it was using.

Conclusion

By following these steps, you can automate the generation and distribution of routine reports in SAS, saving time and reducing the potential for errors. This example illustrates how simple it can be to set up automated email reports, making it easier to ensure that your team receives the necessary data on time, every time.

This approach is highly adaptable and can be expanded to include more complex reports, multiple attachments, or even scheduled automation using job schedulers like CRON (on Linux systems) or Task Scheduler (on Windows). With SAS, you have the tools to streamline your reporting process, allowing you to focus on more critical tasks.

Additional Tips

  • Dynamic Email Content: You can further enhance this automation by making the email content dynamic, such as including the report date or summary statistics directly in the email body.
  • Multiple Recipients: If you need to send the report to multiple recipients, you can separate the email addresses with a comma in the to= option.
  • Email from a Different Address: If your SAS environment supports it, you can specify a different sender email address using the from= option in the filename statement.

Automating routine tasks like report generation and distribution not only saves time but also ensures consistency and accuracy in your reporting. By leveraging the capabilities of SAS, you can create a seamless workflow that keeps your team informed and up to date with minimal manual intervention.

Friday, August 30, 2024

10 Essential SAS Programming Tips for Boosting Your Efficiency

10 Essential SAS Programming Tips for Boosting Your Efficiency

As a SAS programmer, you're always looking for ways to streamline your code, improve efficiency, and enhance the readability of your programs. Whether you're new to SAS or a seasoned pro, these tips will help you optimize your workflows and make the most out of your programming efforts.

Here are ten essential SAS programming tips to elevate your coding skills:

  1. Harness the Power of PROC SQL for Efficient Data Manipulation
    PROC SQL can be a game-changer when it comes to handling complex data manipulations. It allows you to merge datasets, filter records, and create summary statistics all within a few lines of code, making your data processing more concise and effective.

        proc sql;
           select Name, mean(Salary) as Avg_Salary
           from employees
           group by Department
           having Avg_Salary > 50000;
        quit;
        
  2. Simplify Repetitive Tasks with ARRAY
    Repetitive calculations or transformations across multiple variables can clutter your code. Using an ARRAY simplifies these tasks, allowing you to apply changes to multiple variables in a structured and clean manner.

        data new_data;
           set original_data;
           array scores[5] score1-score5;
           do i = 1 to 5;
              scores[i] = scores[i] * 1.1;  /* Applying a 10% increase to all scores */
           end;
        run;
        
  3. Create Dynamic Macro Variables with CALL SYMPUT and CALL SYMPUTX
    Macro variables can make your SAS programs more flexible and reusable. CALL SYMPUT and CALL SYMPUTX allow you to create these variables dynamically during data steps, with CALL SYMPUTX offering the added benefit of trimming spaces.

        data _null_;
           set employees;
           call symputx('emp_count', _n_);
        run;
    
        %put &emp_count;
        
  4. Optimize Subsetting with WHERE Statements
    When subsetting data, WHERE statements are generally more efficient than IF statements. WHERE conditions filter data at the point of reading, which reduces the amount of data loaded into memory, speeding up processing times.

        data subset;
           set employees(where=(Salary > 50000));
        run;
        
  5. Streamline Data Recoding with PROC FORMAT
    PROC FORMAT is an incredibly versatile tool for recoding and grouping values. It enhances your data processing capabilities and improves code readability by allowing you to define and reuse custom formats.

        proc format;
           value salary_fmt
              low - 50000 = 'Low'
              50001 - 100000 = 'Medium'
              100001 - high = 'High';
        run;
    
        proc freq data=employees;
           tables Salary / format=salary_fmt.;
        run;
        
  6. Profile Your Data with PROC CONTENTS and PROC FREQ
    Before diving into analysis, it's crucial to understand the structure and distribution of your data. PROC CONTENTS gives you a detailed overview, while PROC FREQ provides insights into the distribution of categorical variables, helping you identify any data anomalies early on.

        proc contents data=employees; run;
    
        proc freq data=employees;
           tables Department / missing;
        run;
        
  7. Efficiently Manage Variables with KEEP and DROP Statements
    To enhance performance and reduce dataset sizes, selectively keep or drop variables during your data steps. This practice is especially useful when working with large datasets where memory efficiency is crucial.

        data smaller_set;
           set large_set(keep=Name Department Salary);
        run;
        
  8. Concatenate Datasets Seamlessly with PROC APPEND
    When you need to combine datasets, PROC APPEND is often more efficient than using multiple data steps. It appends one dataset to another without re-reading the original data, making it ideal for large datasets.

        proc append base=master_data data=new_data;
        run;
        
  9. Automate Repetitive Tasks with Macro Programming
    Macro programming can dramatically reduce the amount of repetitive code in your SAS programs. By creating macros for commonly used processes, you can maintain consistency and save time, especially when working with similar tasks across multiple datasets.

        %macro process_data(year);
           data processed_&year;
              set raw_data_&year;
              /* Processing steps */
           run;
        %mend process_data;
    
        %process_data(2023);
        %process_data(2024);
        
  10. Debug Efficiently Using SAS OPTIONS
    Debugging is an essential part of the development process. SAS provides several system options like OPTIONS MPRINT;, OPTIONS SYMBOLGEN;, and OPTIONS MLOGIC; that allow you to trace the execution of your code, resolve errors, and understand the values of macro variables.

        options mprint symbolgen mlogic;
        

Tuesday, April 28, 2015

Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions

Today, I stumbled upon a post where the author talks about a new options that are available in SAS 9.3 and later versions. These options (NOUNIQUEKEYS and UNIQUEOUT)  that allows sorting and then finding the duplicate records to be done in one step using PROC SORT.

Direct Link: 

Separating Unique and Duplicate Observations Using PROC SORT in SAS 9.3 and Newer Versions

Christopher J. Bost published a paper in SAS Global Forum 2013 regarding the same option.


Thursday, November 20, 2014

FDA's Official List of Validation Rules for SDTM compliance

Yesterday, FDA published its first official list of validation rules for CDISC SDTM. These long awaited rules cover both conformance and quality requirements, as described in the FDA Study Data Technical Conformance Guide. Conformance validation rules help ensure that the data conform to the standards, while quality checks help ensure the data support meaningful analysis.

For Official list of rules, here is the direct link for the FDA website: http://www.fda.gov/forindustry/datastandards/studydatastandards/default.htm

The FDA is asking sponsors to validate their study data before submission using these published validation rules and either correct any validation issues or explain, why they could not be corrected, in the Study Data Reviewer's Guide. This recommended pre-submission validation step is intended to minimize the presence of validation issues at the time of submission.

Open CDISC is offering a webinar on the official list of validation rules. They are offering 2 sessions, 1 is on Tuesday the December 2 and second one is on Wed, Dec 3.


FDA Official Validation Rules for Submission Data

Wednesday, December 3 at 2:00 PM EST


Click on the above links to register for the webinar.

Best,

Sarath

Thursday, November 14, 2013

How to avoid data set merging problems when common BY variable has different lengths?

When merging 2 datasets with a common by-variable and when that common variable has different variable length, the merge process produce unexpected results.  If you use SAS 9.2 version like me, then SAS Data step will inform you with the following warning:

WARNING: Multiple lengths were specified for the BY variable ****** by input data sets. This may cause unexpected results.

It is good that at least starting SAS 9.2 version, data step issues a Warning message to inform the programmer. But if you use before versions, it is difficult to notice this potential disaster.  

When you see this WARNING message in the SAS log, we might be inclined to ignore this warning because we think this is just a WARNING never realizing the potential danger. When you see this message in the LOG we should be thinking about this instead of avoiding because SAS will do exactly what it states: it may cause unexpected results. In some cases merge won’t even happen between datasets and sometimes the partial merge between the datasets.

Let’s look at the following example.
data table1;
          length visit $13; * LENGTH IS 13;
          visit = "CYCLE 1 DAY 1";
          visitnum = 1;
run;

data table2;
          length visit $14; * LENGTH IS 14;
          visit = "CYCLE 1 DAY 10";
          visitnum = 3;
run;

proc sort data=table1;      by visit;run;
proc sort data=table2;      by visit;run;

TABLE 1;
VISIT
VISITNUM
CYCLE 1 DAY 1
1

TABLE 2;
VISIT
VISITNUM
CYCLE 1 DAY 10
3

*Merge 2 datasets together with common BY variable with different lengths;
data table12;
          merge table1 table2;
          by visit;
run;

*Here is the LOG FILE;

2714
2715  data table1_2;
2716            merge table1 table2;
2717            by visit;
2718  run;

WARNING: Multiple lengths were specified for the BY variable VISIT by input data sets. This may cause unexpected results.
NOTE: There were 1 observations read from the data set WORK.TABLE1.
NOTE: There were 1 observations read from the data set WORK.TABLE2.
NOTE: The data set WORK.TABLE1_2 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

As a result of different lengths, SAS adds only one record to the output dataset rather than 2.

*WRONG OUTPUT CREATED;
VISIT
VISITNUM
CYCLE 1 DAY 1
3

*CORRECT OUTPUT SHOULD BE;
VISIT
VISITNUM
CYCLE 1 DAY 1
1
CYCLE 1 DAY 10
3


*To avoid this potential disaster, I’ve used Proc SQL and then created a macro variable with the maximum length of Visit variable in all the datasets in the WORK directory.;

proc sql;
          select max(length) into :maxlength
          from sashelp.vcolumn
          where libname='WORK'
          and name="VISIT";
quit;

*Visit length form TABLE1 is 13 and from TABLE2 is 14, so since I know the maximum length I will used that in here;

data table1_2;
          length visit $ &maxlength;
          merge table1 table2;
          by visit;
run;

*THIS RESULTS IN CORRECT OUTPUT;
VISIT
VISITNUM
CYCLE 1 DAY 1
1
CYCLE 1 DAY 10
3


Thursday, July 25, 2013

Basic Differences Between Proc MEANS and Proc SUMMARY

Though Proc Means and Proc Summary are 2 different procedures essentially used to compute descriptive statistics of numeric variables, but there are differences between these two. ( 
1)  By default, Proc MEANS  produces printed output in the LISTING window or other open destination whereas Proc SUMMARY does not.  
2) Proc Summary only produces the descriptive statistics for the variables that are specified in the VAR statement, where as Proc Means by default, computes the descriptive statistics of the numeric variables even without the VAR statement.
Here is a post which details the differences:
Excerpt
Proc SUMMARY and Proc MEANS are essentially the same procedure.  Both procedures compute descriptive statistics.  The main difference concerns the default type of output they produce.  Proc MEANS by default produces printed output in the LISTING window or other open destination whereas Proc SUMMARY does not.  Inclusion of the print option on the Proc SUMMARY statement will output results to the output window.
The second difference between the two procedures is reflected in the omission of the VAR statement.  When all variables in the data set are character the same output: a simple count of observations, is produced for each procedure.  However, when some variables in the dataset are numeric, Proc MEANS analyses all numeric variables not listed in any of the other statements and produces default statistics for these variables (N, Mean, Standard Deviation, Minimum and Maximum).