Posts

Showing posts with the label Proc SQL

Best Practices for Joining Additional Columns into an Existing Table Using PROC SQL

Best Practices for Joining Additional Columns into an Existing Table Using PROC SQL Best Practices for Joining Additional Columns into an Existing Table Using PROC SQL When working with large datasets, it's common to add new columns from another table to an existing table using SQL. However, many programmers encounter the challenge of recursive referencing in PROC SQL when attempting to create a new table that references itself. This blog post discusses the best practices for adding columns to an existing table using PROC SQL and provides alternative methods that avoid inefficiencies. 1. The Common Approach and Its Pitfall Here's a simplified example of a common approach to adding columns via a LEFT JOIN : PROC SQL; CREATE TABLE WORK.main_table AS SELECT main.*, a.newcol1, a.newcol2 FROM WORK.main_table main LEFT JOIN WORK.addl_data a ON main.id = a.id; QUIT; While this approach might seem straightforward, it leads to a warning: "CREATE T...

Mastering PROC SQL: Best Practices for Seamlessly Adding Columns

Mastering PROC SQL: Best Practices for Seamlessly Adding Columns Mastering PROC SQL: Best Practices for Seamlessly Adding Columns PROC SQL in SAS is a powerful tool, especially when you need to add new columns to existing tables. Whether you're updating tables with new data or making calculated columns, following best practices ensures your code is efficient, maintainable, and easy to understand. 1. Use ALTER TABLE to Add Columns The most straightforward way to add a column to a table is by using the ALTER TABLE statement. It allows you to add one or more columns without disrupting the existing data structure. proc sql; alter table your_table_name add new_column1 num format=8. label='New Column 1', new_column2 char(20) label='New Column 2'; quit; Tip: Always specify the data type ( num for numeric, char(n) for character) and add useful formats and labels to keep your columns clear and organized. 2. Av...

IFC and IFN functions: New IF functions:

Objective: To Reduce the amount of typing required achieving an objective Syntax: IFN (condition, true, false, missing): ‘N’ stands for Numeric IFN returns a numeric value. It returns the true, false or missing value depending on whether the condition is true, false or missing. Syntax: IFC (condition, true, false, missing): ‘C’ stands for character IFC function has four parameters: 1) a logical expression 2) character value returned when true 3) value returned when false 4) value retuned when missing, which is optional. IFC (logical-expression, Character-value-returned-when-true, Character-value-returned-when-false, Character-value-returned-when-missing); IFC returns a character value. It returns the true, false or missing value depending on whether the condition is true, false or missing. Example: Assign a value to the VISIT variable (new) as per the VTYPE variable value. We can certainly achieve this task in diff. ways.. here are they... data old; input sitesub $ vtype vdate $; card...

Proc Compare/Dictionary.Coulmns/Dictionary.Tables.: Program used to compare the SAS datasets in two directories

Image
Here is the new Proc compare.sas program, I have developed ....to compare all the datasets in 2 directories(testing and production) at once and to quick check any mismatches. Proc compare only check if there is any mismatches between the datasets in 2 directories. If any, it reports otherwise it will give us a note saying that: Note: No unequal Values were found. All values compared are exactly equal. See the proc compare snap shot: What if any dataset has the length more than 8, and what if any variable length more than 40 and what if the dataset name has more than 8 characters etc... Proc Compare doesn't address this issue. I have developed the following program to address this issue. It’s a mandatory that we need to follow certain requirements when we are preparing for an electronic submission to the FDA. The following are some of the QC checks FDA requirements: 1) The length of a dataset name & variable name shouldn’t be more than 8 characters. 2) The length data set labe...

How to determine the executing program name and path programatically

Sometimes, we need to keep the name and path of the executing program in the FOOTNOTE of the generated table or listings. I have always created a macro variable using the %let statement and, then I called the Macro variable in the footnote statement to get the name of the program. Eventhough it is simple.. it may not be suitable when we write application which need to self document... Here is another technique which can be used to retrieve the pathname and filename (last executed) ..... To get the last opened filename: proc sql noprint; select scan (xpath,-1, ' \ ') into :progname from sashelp.vextfl where upcase( xpath ) like '%.SAS' ; quit ; %put &progname; result: filename.sas or data _null_; set sashelp.vextfl( where = ( upcase ( xpath ) like '%.SAS' )); call symput ( 'program' , scan ( xpath , - 1 , '\' )); run ; %put &program; result: filename.sas To get path name ... then .. use the following code; ...

PROC SQL basics, tips and techniques and sample code programs

Proc SQL: Power of SAS SQL: • SQL looks at datasets differently from SAS – SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to the last – SQL looks at all the records, as a single object • Because of this difference SQL can easily do few things that are more difficult to do in SAS • There are a number of built in functions in SQL that can be used in a select statement • Because of how SQL handles a dataset, these functions work over the entire dataset • Functions: – Count: Counts Values – Sum: Sums Values – Max: Identifies the largest value – Min: Identifies the smallest value – Mean: Averages the values Read more at www.cognigencorp.com/perspective/tipsNtricks.pub/1/PROC%20SQL%20Talk_12_.ppt – SAS SQL Introduction to Proc SQL AN INTRODUCTION TO PROC SQL® PROC SQL: When and How to Use It? Proc SQL – A Primer for SAS Programmers Understanding PROC SQL Creating Macro Variables with Proc SQL DATA Step vs. PROC SQL: What’s a neophyte to d...

Mastering Duplicates Removal in SAS: A Comprehensive Guide to Using PROC SQL, DATA STEP, and PROC SORT

Removing Duplicate Observations in SAS: A Comprehensive Guide Removing Duplicate Observations in SAS: A Comprehensive Guide In data analysis, it's common to encounter datasets with duplicate records that need to be cleaned up. SAS offers several methods to remove these duplicates, each with its strengths and suitable scenarios. This article explores three primary methods for removing duplicate observations: using PROC SQL , the DATA STEP , and PROC SORT . We will provide detailed examples and discuss when to use each method. Understanding Duplicate Observations Before diving into the methods, let's clarify what we mean by duplicate observations. Duplicates can occur in different forms: Exact Duplicates: All variables across two or more observations have identical values. Key-Based Duplicates: Observations are considered duplicates based on the values of specific key variables (e.g., ID, Date). The ...

PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax, Strengths, and Use Cases

PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax and Use Cases PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax and Use Cases SAS provides powerful tools for data manipulation, with `PROC SQL` and `DATA STEP` being two of the most commonly used approaches. Each has its own strengths, syntax, and use cases, making them suitable for different types of tasks. This report provides a detailed comparison of `PROC SQL` and `DATA STEP` to help you understand when and how to use each approach effectively. Understanding PROC SQL and DATA STEP Before diving into the comparison, let's briefly introduce `PROC SQL` and `DATA STEP`: PROC SQL: A procedure that enables you to use SQL (Structured Query Language) within SAS to query, manipulate, and manage data. It is particularly powerful for operations that involve multiple tables or require complex querying. DATA STEP: A foundational part o...