Discover More Tips and Techniques on This Blog

SAS Macros


Introduction to SAS Macros -


SUGI Paper Posters Comparing Macros to Arrays to Simple -



SAS Macro Programming Tips and Techniques -


040-31 Tight Looping With Macro Arrays -

PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax, Strengths, and Use Cases

PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax and Use Cases

PROC SQL vs. DATA STEP in SAS: A Comprehensive Comparison of Syntax and Use Cases

SAS provides powerful tools for data manipulation, with `PROC SQL` and `DATA STEP` being two of the most commonly used approaches. Each has its own strengths, syntax, and use cases, making them suitable for different types of tasks. This report provides a detailed comparison of `PROC SQL` and `DATA STEP` to help you understand when and how to use each approach effectively.

Understanding PROC SQL and DATA STEP

Before diving into the comparison, let's briefly introduce `PROC SQL` and `DATA STEP`:

  • PROC SQL: A procedure that enables you to use SQL (Structured Query Language) within SAS to query, manipulate, and manage data. It is particularly powerful for operations that involve multiple tables or require complex querying.
  • DATA STEP: A foundational part of SAS programming, allowing you to create, transform, and analyze datasets. `DATA STEP` is ideal for row-wise processing, data transformation, and straightforward data manipulation tasks.

Comparing Syntax: PROC SQL vs. DATA STEP

Below are several common data manipulation tasks, with examples of how they are performed using both `PROC SQL` and `DATA STEP`.

1. Creating a New Dataset

Creating a new dataset is a fundamental task in SAS. Both `PROC SQL` and `DATA STEP` can accomplish this, but their approaches differ.

PROC SQL:

proc sql;
    create table new_data as
    select *
    from old_data;
quit;

DATA STEP:

data new_data;
    set old_data;
run;

Comparison: The `DATA STEP` uses the `SET` statement to reference the source dataset, which is more direct. `PROC SQL`, on the other hand, uses the `CREATE TABLE` statement combined with a `SELECT` statement, which can be more powerful when selecting specific columns or applying complex logic.

2. Filtering Data

Filtering allows you to create a subset of data based on specific criteria.

PROC SQL:

proc sql;
    create table filtered_data as
    select *
    from old_data
    where age > 30;
quit;

DATA STEP:

data filtered_data;
    set old_data;
    if age > 30;
run;

Comparison: Both approaches allow for straightforward data filtering. In `PROC SQL`, filtering is done within the `WHERE` clause of the `SELECT` statement, while in `DATA STEP`, filtering is achieved using the `IF` statement. Both methods are efficient, though `PROC SQL` can be more concise for complex filtering logic.

3. Joining Datasets

Joining datasets is essential when you need to combine information from two or more tables. This is a common scenario in relational databases.

PROC SQL:

proc sql;
    create table joined_data as
    select a.*, b.*
    from dataset_a as a
    inner join dataset_b as b
    on a.id = b.id;
quit;

DATA STEP:

proc sort data=dataset_a;
    by id;
run;

proc sort data=dataset_b;
    by id;
run;

data joined_data;
    merge dataset_a(in=a) dataset_b(in=b);
    by id;
    if a and b;
run;

Comparison: `PROC SQL` is more intuitive for performing joins, offering flexibility with different types of joins (INNER, LEFT, RIGHT, FULL). The `DATA STEP` requires the datasets to be pre-sorted and uses the `MERGE` statement, which is effective but more manual. `PROC SQL` is generally preferred for complex joins.

4. Aggregating Data

Aggregation, such as calculating sums, averages, or counts, is another common task in data analysis.

PROC SQL:

proc sql;
    create table summary_data as
    select id, mean(value) as avg_value
    from old_data
    group by id;
quit;

DATA STEP:

proc sort data=old_data;
    by id;
run;

data summary_data;
    set old_data;
    by id;
    retain sum_value count_value;
    if first.id then do;
        sum_value = 0;
        count_value = 0;
    end;
    sum_value + value;
    count_value + 1;
    if last.id then do;
        avg_value = sum_value / count_value;
        output;
    end;
run;

Comparison: `PROC SQL` simplifies aggregation with the `GROUP BY` clause, making it easier to perform summary statistics. The `DATA STEP` approach requires more steps, including sorting and manually calculating the aggregates using `RETAIN` and `FIRST./LAST.` logic. `PROC SQL` is generally more efficient for this type of task.

5. Subsetting Data

Subsetting involves creating a smaller dataset that meets specific criteria from a larger dataset.

PROC SQL:

proc sql;
    create table subset_data as
    select *
    from old_data
    where sex = 'M' and age > 25;
quit;

DATA STEP:

data subset_data;
    set old_data;
    if sex = 'M' and age > 25;
run;

Comparison: The syntax for subsetting data is quite similar between `PROC SQL` and `DATA STEP`. Both approaches are efficient, with `PROC SQL` using the `WHERE` clause and `DATA STEP` using the `IF` statement to achieve the same result.

Strengths and Weaknesses of Each Approach

Both `PROC SQL` and `DATA STEP` have distinct strengths and weaknesses, depending on the task at hand:

PROC SQL

  • Strengths:
    • Ideal for complex queries and manipulations involving multiple datasets.
    • Powerful for joins, aggregations, and subqueries.
    • SQL syntax is familiar to those with a background in database management.
    • Concise code for tasks like aggregation and filtering.
  • Weaknesses:
    • May be less efficient for row-wise operations compared to `DATA STEP`.
    • Requires a learning curve for those unfamiliar with SQL.
    • Less intuitive for tasks involving iterative processing or complex data transformations.

DATA STEP

  • Strengths:
    • Excellent for row-wise data processing and transformations.
    • Native to SAS, making it familiar to SAS users.
    • Flexible for custom data manipulation, including loops and conditional logic.
    • Efficient for large datasets requiring simple, straightforward processing.
  • Weaknesses:
    • Requires more code for tasks like joins and aggregation, which `PROC SQL` handles more succinctly.
    • Sorting is often required before merging datasets.
    • Less concise for complex querying and multiple dataset manipulations.

Choosing Between PROC SQL and DATA STEP

Deciding between `PROC SQL` and `DATA STEP` depends on the specific requirements of your task:

  • Choose `PROC SQL` when:
    • You need to perform complex joins, aggregations, or subqueries.
    • Your task involves querying and manipulating multiple datasets simultaneously.
    • You are familiar with SQL or prefer SQL syntax.
    • Efficiency is important for tasks like summarization and filtering.
  • Choose `DATA STEP` when:
    • You need to perform row-wise operations, data transformations, or complex conditional logic.
    • Your task involves data cleaning, sorting, or simple merges.
    • You are more comfortable with SAS-specific programming and need flexibility in data manipulation.
    • Efficiency is needed for handling large datasets with straightforward processing requirements.

Conclusion

Both `PROC SQL` and `DATA STEP` are powerful tools in SAS, each with its own advantages and ideal use cases. Understanding the differences in their syntax and capabilities allows you to choose the most appropriate tool for your specific data manipulation tasks. Whether you prefer the flexibility of SQL or the procedural control of `DATA STEP`, mastering both will enhance your ability to handle complex data processing in SAS efficiently.

Oracle Clinical

Oracle Clinical for SAS Programmer


NESUG Posters Oracle Clinical for SAS Programmers Kevin Lee -

SAS Basics

Intro SAS basics

INTRODUCTION TO SAS

SAS Procedures

SAS Clinical Trials/ SAS in Pharmaceuticals

Clinical Trials Terminology for SAS Programmers

Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs

SAS Programming in the Pharmaceutical Industry

SASâ Programming Career Choices In The Health Care Industry

The Changing Nature of SAS Programming in the Pharmaceuticals Industry

Clinical Trails

CDISC: Why SAS® Programmers Need to Know

CDISC Implementation Step by Step: A Real World Example CDISC standards

Design of Case Report Forms

Download Sample CRF's and Protocol
CRFs (PDF - 773.4 KB)
Protocol (PDF - 2.0 MB) direct link: https://biolincc.nhlbi.nih.gov/studies/

SDTM-annotated CRFs

Electronic Clinical Data Capture

How to do Clinical Data entry

Clinical Trial Terminology

E9 - Statistical Principles for Clinical Trials

Clinical Trails Glossary

Clinical Data Management and E-clinical Trials (IPS)

SAS Tutorials (Video): Free



SAS Video Tutorials:

Class Notes SAS 9.2Entering Data with movies*Exploring Data with movies
Modifying Data with movies
Managing Data with movies
Analyzing Data with movies
General Information with movies

Class Notes : Before VersionsEntering Data, view movie
Exploring Data, view movie
Modifying Data, view movie
Managing Data, view movie
Analyzing Data, view movie (part 1) and movie (part 2)
Fancy Graphics and other cool SAS code
Data step:getting started 1: windows SAS code
getting started 2: data step SAS code
automatic _N_ variable SAS code drop & delete SAS code
formating: dates and numbers SAS code date sal.txt (also see the format procedure below to create your own formats)functions SAS code
import: Bringing in data from Excel SAS code Excel import file Excel export file text file
input: length statement SAS code infile options.txt
long SAS code long.txt
missing data SAS code
output option SAS code
pointers SAS code ex7.txt ex8.txt ex9.txt
more about pointers SAS code pointers.SAS ex10.txt
missover & delimiter SAS code delimiter.txt
more on the delimiter SAS code
retain SAS code
set SAS code
simulations: random numbers SAS code
sum SAS code
statistical functions SAS code

Logic:
do loops SAS code
more about do loops SAS code
nested do loops SAS code
if then statements SAS code score.txt

Combining Data sets:
concatenating and interleaving SAS code
one-to-one merging SAS code
match merging SAS code
updating SAS code
Character functions:substring function SAS code
trim and left functions SAS code
compress and index functions SAS code record.txt
indexc and indexw functions SAS code
implicit character-to-numeric conversion SAS code
explicit character-to-numeric conversion SAS code
implicit and explicit numeric-to-character conversion SAS code


Arrays:
introduction to arrays SAS code
using arrays to count SAS code
using arrays to order observations SAS code
using arrays to transpose data SAS code ratsdose.txt
two dimensional arrays SAS code temp.txt fin.txt

Permanent SAS Data sets: (great for large data sets)
introduction: using libname SAS code
put and file statements SAS code survey.dat data1.dat data2.dat data3.txt fruit.dat data4.dat data5.dat income.dat

Procedures:ANOVA SAS code incommed.dat
analysis of equal vars: B-P for anova SAS code
contents: Great for large data sets SAS code sheep.dat
correlation SAS code
import: Bringing in data from Excel SAS code
Excel import file Excel export file text file
format SAS code incommed.dat (also see formating above for SAS' date and number formats) frequency tables SAS code incommed.dat
freq.xls means SAS code incommed.dat
more about means SAS code incommed.dat
gcharts: Bar and Pie charts SAS code incommed.dat
gplot: a prettier plot SAS code
more about gplot SAS code
plot SAS code print SAS code account.txt
sort SAS code account.txt
more about sorting SAS code compt.txt t-test SAS code incommed.dat
transpose SAS code
univariate SAS code test.dat
Programming outside the Data step or Procedures:Getting started 3: options SAS code
more options SAS code
Macros:introduction: macro variables (%let statement) SAS code contest.dat
%put statement SAS code score.dat
basic macros SAS code
macros with parameters SAS code ranks.dat
macro do loops SAS code
macro if/then/else statements SAS code makeup.dat
nested macros SAS code
simulations example SAS code reg.dat
SAS Video Tutorials from Virginia Commonwealth University:Intro_to_SAS.wmv
Introduction_to_the_SAS_Data_Step.wmv
Importing_Data_into_SAS.wmv
Introduction_to_PROC_UNIVARIATE.wmv
Introduction_to_the_Output_Statement_in_SAS.wmv
Introduction_to_Boxplots_in_SAS.wmv
Intro_to_Scatterplots.wmv
Introduction_to_Proc_Corr_in_SAS.wmv
Introduction_to_Proc_Reg_in_SAS.wmv
Assessing_Regression_Models_in_SAS.wmv
Testing_for_Normality_in_Regression_Models.wmv
Testing_for_Constant_Variance_in_Regression_using_SAS.wmv Introduction_to_Proc_GLM_in_SAS.wmv
Testing_for_Normality_in_ANOVA_models_using_SAS.wmv
Testing_for_Constant_Variance_in_ANOVA_models_using_SAS.wmv
Multiple_Comparisons_in_SAS.wmv


SAS Interview Questions: General(Part-2)

Under what circumstances would you code a SELECT construct instead of IF statements?

A: I think Select statement is used when you are using one condition to compare with several conditions like…….

Data exam;
Set exam;
select (pass);
when Physics gt 60;
when math gt 100;
when English eq 50;
otherwise fail;
run;

What is the one statement to set the criteria of data that can be coded in any step?
A) Options statement.

What is the effect of the OPTIONS statement ERRORS=1?

A) The –ERROR- variable ha a value of 1 if there is an error in the data for that observation and 0 if it is not.

What's the difference between VAR A1 - A4 and VAR A1 -- A4?

A) Refer the following link:
http://studysas.blogspot.com/2009/07/even-you-can-use-hash-and-double-dash.html


What do the SAS log messages "numeric values have been converted to character" mean? What are the implications?

A) It implies that automatic conversion took place to make character functions possible.

Why is a STOP statement needed for the POINT= option on a SET statement?
A) Because POINT= reads only the specified observations, SAS cannot detect an end-of-file condition as it would if the file were being read sequentially.

How do you control the number of observations and/or variables read or written?

A) FIRSTOBS and OBS option

Approximately what date is represented by the SAS date value of 730?
A) 31st December 1961

Identify statements whose placement in the DATA step is critical.
A) INPUT, DATA and RUN…

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
A) Compile

What does the RUN statement do?
A) When SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step. Following the data step then you can avoid the usage of the run statement.

Why is SAS considered self-documenting?
A) SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using proc contents procedure.

What are some good SAS programming practices for processing very large data sets?
A) Sort them once, can use firstobs = and obs = ,

What is the different between functions and PROCs that calculate thesame simple descriptive statistics?
A) Functions can used inside the data step and on the same data set but with proc's you can create a new data sets to output the results. May be more ...........

If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE?
A) I would use TRANSPOSE if the variables are less use arrays if the var are more ................. depends

What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted data?
A) In unsorted data you can't use First. or Last.

How do you debug and test your SAS programs?
A) First thing is look into Log for errors or warning or NOTE in some cases or use the debugger in SAS data step.

What other SAS features do you use for error trapping and datavalidation?
A) Check the Log and for data validation things like Proc Freq, Proc means or some times proc print to look how the data looks like ........

How would you combine 3 or more tables with different structures?
A) I think sort them with common variables and use merge statement. I am not sure what you mean different structures.


Other questions:

What areas of SAS are you most interested in?
A) BASE, STAT, GRAPH, ETSBriefly

Describe 5 ways to do a "table lookup" in SAS.
A) Match Merging, Direct Access, Format Tables, Arrays, PROC SQL

What versions of SAS have you used (on which platforms)?
A) SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12

What are some good SAS programming practices for processing very large data sets?
A) Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null

What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data?
A) The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable values from an analysis.

How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable?
A) Using PROC TRANSPOSE

What is the different between functions and PROCs that calculate the same simple descriptive statistics?
A) Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets.

If you were told to create many records from one record, show how you would do this using array and with PROC TRANSPOSE?

A) Declare array for number of variables in the record and then used Do loop Proc Transpose with VAR statement

What are _numeric_ and _character_ and what do they do?
A) Will either read or writes all numeric and character variables in dataset.

How would you create multiple observations from a single observation?
A) Using double Trailing @@

For what purpose would you use the RETAIN statement?
A) The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step.What is the order of evaluation of the comparison operators: + - * / ** ()?A) (), **, *, /, +, -

How could you generate test data with no input data?
A) Using Data Null and put statement

How do you debug and test your SAS programs?
A) Using Obs=0 and systems options to trace the program execution in log.

What can you learn from the SAS log when debugging?
A) It will display the execution of whole program and the logic. It will also display the error with line number so that you can and edit the program.

What is the purpose of _error_?
A) It has only to values, which are 1 for error and 0 for no error.

How can you put a "trace" in your program?
A) By using ODS TRACE ON

How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs?
A) Missing values will be assigned as missing in Assignment statement. Sort order treats missing as second smallest followed by underscore.

How do you test for missing values?
A) Using Subset functions like IF then Else, Where and Select.

How are numeric and character missing values represented internally?
A) Character as Blank or “ and Numeric as.

Which date functions advances a date time or date/time value by a given interval?
A) INTNX.

In the flow of DATA step processing, what is the first action in a typical DATA Step?
A) When you submit a DATA step, SAS processes the DATA step and then creates a new SAS data set.( creation of input buffer and PDV)
Compilation Phase
Execution Phase

What are SAS/ACCESS and SAS/CONNECT?
A) SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc. SAS/Connect only use Server connection.

What is the one statement to set the criteria of data that can be coded in any step?
A) OPTIONS Statement, Label statement, Keep / Drop statements.

What is the purpose of using the N=PS option?
A) The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted randomly prior to it being printed.

What are the scrubbing procedures in SAS?
A) Proc Sort with nodupkey option, because it will eliminate the duplicate values.

What are the new features included in the new version of SAS i.e., SAS9.1.3?
A) The main advantage of version 9 is faster execution of applications and centralized access of data and support.

There are lots of changes has been made in the version 9 when we compared with the version 8. The following are the few:SAS version 9 supports Formats longer than 8 bytes & is not possible with version 8.
Length for Numeric format allowed in version 9 is 32 where as 8 in version 8.
Length for Character names in version 9 is 31 where as in version 8 is 32.
Length for numeric informat in version 9 is 31, 8 in version 8.
Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date, time and datetime forms of data into a SAS date or SAS time.

·ANYDTDTEW. - Converts to a SAS date value ·ANYDTTMEW. - Converts to a SAS time value. ·ANYDTDTMW. -Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable at execution time in the data step by ·

Trimming trailing blanks · Automatically converting numeric value to character.
New ODS option (COLUMN OPTION) is included to create a multiple columns in the output.

WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS.
The SAS 9

A) Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new component, the Metadata Server, to provide an information layer between the programs and the data they access. Metadata, such as security permissions for SAS libraries and where the various SAS servers are running, are maintained in a common repository.

What has been your most common programming mistake?
A) Missing semicolon and not checking log after submitting program,
Not using debugging techniques and not using Fsview option vigorously.

Name several ways to achieve efficiency in your program.
Efficiency and performance strategies can be classified into 5 different areas.
·CPU time
·Data Storage
· Elapsed time
· Input/Output
· Memory CPU Time and Elapsed Time- Base line measurements

Few Examples for efficiency violations:
Retaining unwanted datasets Not sub setting early to eliminate unwanted records.
Efficiency improving techniques:
A)
Using KEEP and DROP statements to retain necessary variables. Use macros for reducing the code.
Using IF-THEN/ELSE statements to process data programming.
Use SQL procedure to reduce number of programming steps.
Using of length statements to reduce the variable size for reducing the Data storage.
Use of Data _NULL_ steps for processing null data sets for Data storage.

What other SAS products have you used and consider yourself proficient in using?
B) A) Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc Univariate etc.

What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);
A) If don’t use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 & a6 and a9. It is true for mean option also.

What do the PUT and INPUT functions do?
A) INPUT function converts character data values to numeric values.
PUT function converts numeric values to character values.EX: for INPUT: INPUT (source, informat)
For PUT: PUT (source, format)
Note that INPUT function requires INFORMAT and PUT function requires FORMAT.
If we omit the INPUT or the PUT function during the data conversion, SAS will detect the mismatched variables and will try an automatic character-to-numeric or numeric-to-character conversion. But sometimes this doesn’t work because $ sign prevents such conversion. Therefore it is always advisable to include INPUT and PUT functions in your programs when conversions occur.

Introduction to SAS Informats and Formats




Which date function advances a date, time or datetime value by a given interval?
INTNX:
INTNX function advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value. Ex: INTNX(interval,start-from,number-of-increments,alignment)

INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the number of intervals between two give SAS dates, Time and/or datetime.

DATETIME () returns the current date and time of day.

DATDIF (sdate,edate,basis): returns the number of days between two dates.

What do the MOD and INT function do? What do the PAD and DIM functions do? MOD:
A) Modulo is a constant or numeric variable, the function returns the reminder after numeric value divided by modulo.

INT: It returns the integer portion of a numeric value truncating the decimal portion.

PAD: it pads each record with blanks so that all data lines have the same length. It is used in the INFILE statement. It is useful only when missing data occurs at the end of the record.

CATX: concatenate character strings, removes leading and trailing blanks and inserts separators.

SCAN: it returns a specified word from a character value. Scan function assigns a length of 200 to each target variable.

SUBSTR: extracts a sub string and replaces character values.Extraction of a substring: Middleinitial=substr(middlename,1,1); Replacing character values: substr (phone,1,3)=’433’; If SUBSTR function is on the left side of a statement, the function replaces the contents of the character variable.

TRIM: trims the trailing blanks from the character values.

SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the sub string to extract from a character value.

How might you use MOD and INT on numeric to mimic SUBSTR on character Strings?
A) The first argument to the MOD function is a numeric, the second is a non-zero numeric; the result is the remainder when the integer quotient of argument-1 is divided by argument-2. The INT function takes only one argument and returns the integer portion of an argument, truncating the decimal portion. Note that the argument can be an expression.

DATA NEW ;
A = 123456 ;
X = INT( A/1000 ) ;
Y = MOD( A, 1000 ) ;
Z = MOD( INT( A/100 ), 100 ) ;
PUT A= X= Y= Z= ;
RUN ;

Result:

A=123456
X=123
Y=456
Z=34

In ARRAY processing, what does the DIM function do?
A) DIM: It is used to return the number of elements in the array. When we use Dim function we would have to re –specify the stop value of an iterative DO statement if u change the dimension of the array.

How would you determine the number of missing or nonmissing values in computations?
A) To determine the number of missing values that are excluded in a computation, use the NMISS function.

data _null_;
m = . ;
y = 4 ;
z = 0 ;
N = N(m , y, z);
NMISS = NMISS (m , y, z);
run;

The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values).

Do you need to know if there are any missing values?
A) Just use: missing_values=MISSING(field1,field2,field3);
This function simply returns 0 if there aren't any or 1 if there are missing values.If you need to know how many missing values you have then use num_missing=NMISS(field1,field2,field3);

You can also find the number of non-missing values with non_missing=N (field1,field2,field3);

What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);?
A) Is anyone wondering why you wouldn’t just use total=field1+field2+field3;

First, how do you want missing values handled?
The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs.However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of fielda--zfield); Just make sure you remember the “of” and the double dashes or your code will run but you won’t get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values.There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985.

How would you accomplish this in data step code?
Using only PROC FORMAT.

data new ;
input date ddmmyy10.;
cards;
01/05/1955
01/09/1970
01/12/1975
19/10/1979
25/10/1982
10/10/1988
27/12/1991
;
run;

proc format ;
value dat low-'01jan1975'd=ddmmyy10.'01jan1975'd-'01JAN1985'd="Disco Years"'
01JAN1985'd-high=date9.;
run;

proc print;
format date dat. ;
run;

In the following DATA step, what is needed for 'fraction' to print to the log?
data _null_;
x=1/3;
if x=.3333 then put 'fraction';
run;

What is the difference between calculating the 'mean' using the mean function and PROC MEANS?
A) By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values.

What are some differences between PROC SUMMARY and PROC MEANS?
Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result.

What is a problem with merging two data sets that have variables with the same name but different data?
A) Understanding the basic algorithm of MERGE will help you understand how the stepProcesses. There are still a few common scenarios whose results sometimes catch users off guard. Here are a few of the most frequent 'gotchas':

1- BY variables has different lengthsIt is possible to perform a MERGE when the lengths of the BY variables are different,But if the data set with the shorter version is listed first on the MERGE statement, theShorter length will be used for the length of the BY variable during the merge. Due to this shorter length, truncation occurs and unintended combinations could result.In Version 8, a warning is issued to point out this data integrity risk. The warning will be issued regardless of which data set is listed first:WARNING: Multiple lengths were specified for the BY variable name by input data sets.This may cause unexpected results. Truncation can be avoided by naming the data set with the longest length for the BY variable first on the MERGE statement, but the warning message is still issued. To prevent the warning, ensure the BY variables have the same length prior to combining them in the MERGE step with PROC CONTENTS. You can change the variable length with either a LENGTH statement in the merge DATA step prior to the MERGE statement, or by recreating the data sets to have identical lengths for the BY variables.Note: When doing MERGE we should not have MERGE and IF-THEN statement in one data step if the IF-THEN statement involves two variables that come from two different merging data sets. If it is not completely clear when MERGE and IF-THEN can be used in one data step and when it should not be, then it is best to simply always separate them in different data step. By following the above recommendation, it will ensure an error-free merge result.

Which data set is the controlling data set in the MERGE statement?
A) Dataset having the less number of observations control the data set in the merge statement.

How do the IN= variables improve the capability of a MERGE?
A) The IN=variablesWhat if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the "IN=" variables, so that you can do this and more. Here's what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them:data three;merge one(in=x) two(in=y); /* x & y are your choices of names */by id; /* for the IN= variables for data */if x=1 and y=1; /* sets one and two respectively */run;

What techniques and/or PROCs do you use for tables?
A) Proc Freq, Proc univariate, Proc Tabulate & Proc Report.

Do you prefer PROC REPORT or PROC TABULATE? Why?
A) I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the look up of my table, (ex: Width option, by this we can change the width of each column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.

How experienced are you with customized reporting and use of DATA _NULL_ features?
A) I have very good experience in creating customized reports as well as with Data _NULL_ step. It’s a Data step that generates a report without creating the dataset there by development time can be saved. The other advantages of Data NULL is when we submit, if there is any compilation error is there in the statement which can be detected and written to the log there by error can be detected by checking the log after submitting it. It is also used to create the macro variables in the data set.

What is the difference between nodup and nodupkey options?
A) NODUP compares all the variables in our dataset while NODUPKEY compares just the BY variables.

What is the difference between compiler and interpreter?
Give any one example (software product) that act as an interpreter?
A) Both are similar as they achieve similar purposes, but inherently different as to how they achieve that purpose. The interpreter translates instructions one at a time, and then executes those instructions immediately. Compiled code takes programs (source) written in SAS programming language, and then ultimately translates it into object code or machine language. Compiled code does the work much more efficiently, because it produces a complete machine language program, which can then be executed.

Code the table’s statement for a single level frequency?
A) Proc freq data=lib.dataset;
table var;*here you can mention single variable of multiple variables seperated by space to get single frequency;
run;

What is the main difference between rename and label?
A) 1. Label is global and rename is local i.e., label statement can be used either in proc or data step where as rename should be used only in data step. 2. If we rename a variable, old name will be lost but if we label a variable its short name (old name) exists along with its descriptive name.

What is Enterprise Guide? What is the use of it?
A) It is an approach to import text files with SAS (It comes free with Base SAS version 9.0)

What other SAS features do you use for error trapping and data validation?
What are the validation tools in SAS?
A) For dataset: Data set name/debugData set: name/stmtchk
For macros: Options:mprint mlogic symbolgen.

How can you put a "trace" in your program?
A) ODS Trace ON, ODS Trace OFF the trace records.

How would you code a merge that will keep only the observations that have matches from both data sets?


 Using "IN" variable option. Look at the following example.

data three;
merge one(in=x) two(in=y);
by id;
if x=1 and y=1;
run;

*or;

data three;
merge one(in=x) two(in=y);
by id;
if x and y;
run;


What are input dataset and output dataset options?
A) Input data set options are obs, firstobs, where, in output data set options compress, reuse.Both input and output dataset options include keep, drop, rename, obs, first obs.

How can u create zero observation dataset?
A) Creating a data set by using the like clause.ex: proc sql;create table latha.emp like oracle.emp;quit;In this the like clause triggers the existing table structure to be copied to the new table. using this method result in the creation of an empty table.

Have you ever-linked SAS code, If so, describe the link and any required statements used to either process the code or the step itself?

A) In the editor window we write%include 'path of the sas file';run;if it is with non-windowing environment no need to give run statement.

How can u import .CSV file in to SAS? tell Syntax?
A) To create CSV file, we have to open notepad, then, declare the variables.

proc import datafile='E:\age.csv'
out=sarath dbms=csv replace;
getnames=yes;
run;

What is the use of Proc SQl?
A) PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new dataset all in one step! PROC SQL uses fewer resources when compard to that of data and proc steps. To join files in PROC SQL it does not require to sort the data prior to merging, which is must, is data merge.

What is SAS GRAPH?
A) SAS/GRAPH software creates and delivers accurate, high-impact visuals that enable decision makers to gain a quick understanding of critical business issues.

Why is a STOP statement needed for the point=option on a SET statement?
A) When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or Both. Because POINT= reads only those observations that are specified in the DO statement, SAScannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop.

What is the difference between nodup and nodupkey options?
A) http://studysas.blogspot.com/2009/03/proc-sort-nodup-vs-nodupkey.html

A)

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.