SAS Programmer Interview Questions and Answers
Preparing for a SAS programming interview can be challenging, as the questions can range from basic syntax and data manipulation to more advanced topics like macro programming, SQL, and optimization techniques. Below are some common SAS programmer interview questions along with suggested answers that can help you get ready for your interview.
1. What is SAS, and why is it used?
Answer: SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, business intelligence, data management, and predictive analytics. It is widely used in industries like pharmaceuticals, finance, healthcare, and marketing for data analysis, reporting, and decision-making.
2. Explain the difference between PROC MEANS
and PROC SUMMARY
.
Answer: Both PROC MEANS
and PROC SUMMARY
are used to compute descriptive statistics in SAS. The primary difference is:
PROC MEANS
produces printed output by default, displaying statistics such as mean, standard deviation, min, and max.PROC SUMMARY
does not produce printed output by default unless thePRINT
option is used. It is often used to create an output dataset containing summary statistics.
3. How do you read a raw data file in SAS?
Answer: You can read a raw data file in SAS using the INFILE
statement within a Data Step. Here's an example:
data mydata;
infile 'path-to-file/data.txt' dlm=',' dsd firstobs=2;
input name $ age height weight;
run;
- INFILE
specifies the location of the raw data file.
- DLM=
specifies the delimiter (e.g., comma).
- DSD
handles consecutive delimiters and removes quotes from character values.
- FIRSTOBS=
specifies the first line of data to read (useful if the file has headers).
4. What are the different types of MERGE in SAS?
Answer: In SAS, you can perform different types of merges depending on your data and requirements:
- One-to-One Merge: Combines datasets by matching observations by their relative position.
- Match-Merge: Combines datasets by matching observations based on a common variable using the
BY
statement. - Many-to-One or One-to-Many Merge: One dataset has multiple observations for the same key, and the other dataset has a single observation per key.
- Many-to-Many Merge: Not recommended because SAS can create duplicate combinations of observations, leading to unexpected results.
5. What is the purpose of the PDV
(Program Data Vector) in SAS?
Answer: The Program Data Vector (PDV) is an area of memory where SAS builds a dataset, one observation at a time. It holds the values of all variables in the dataset while the Data Step is processing. Understanding the PDV is crucial for understanding how data is processed and how variables are retained or reset across iterations of the Data Step.
6. How do you create a macro variable in SAS, and how do you use it?
Answer: Macro variables in SAS can be created using the %LET
statement or within a Data Step using CALL SYMPUT
. Here's an example using %LET
:
%let varname = age;
proc means data=sashelp.class;
var &varname.;
run;
In this example, &varname.
is a macro variable that holds the value age
. It is used in the PROC MEANS
step to specify the variable to be analyzed.
7. Explain the difference between PROC SORT
and ORDER BY
in PROC SQL
.
Answer:
PROC SORT
is a procedure that sorts a dataset in ascending or descending order based on one or more variables. The sorted dataset can then be used in subsequent Data Steps or procedures.ORDER BY
is a clause inPROC SQL
that sorts the output of a query based on specified columns. The sorting happens during the query execution and does not alter the order of the input dataset.
Example of PROC SORT
:
proc sort data=sashelp.class out=sorted_class;
by age;
run;
Example of ORDER BY
in PROC SQL
:
proc sql;
select * from sashelp.class
order by age;
quit;
8. What is a Hash Object in SAS, and when would you use it?
Answer: A Hash Object in SAS is an in-memory data structure that allows for fast data retrieval based on key-value pairs. It is particularly useful when you need to perform lookups, merges, or aggregation on large datasets because it can be faster than using traditional merge techniques.
data _null_;
if _n_ = 1 then do;
declare hash h(dataset:'lookup_table');
h.defineKey('key_variable');
h.defineData('data_variable');
h.defineDone();
end;
set large_dataset;
if h.find() = 0 then output;
run;
9. How can you handle missing values in SAS?
Answer: Missing values in SAS can be handled in several ways, depending on the context:
- Using conditional logic: For example,
if var = . then ...
to check for missing numeric values. - Replacing missing values: You can use functions like
COALESCE
orIFN
to replace missing values with a default value. - Excluding missing values from analysis: Many procedures have options to exclude missing values, like
NMISS
or theWHERE
clause.
Example:
data filled;
set original;
if age = . then age = 18;
run;
10. What is the difference between INPUT
and INFORMAT
in SAS?
Answer:
INPUT
Function: Converts character data to numeric or another character format. It reads the value of a variable using a specified informat.INFORMAT
Statement: Assigns an informat to a variable, dictating how SAS should read the data from a raw data file.
Example of INPUT
:
data convert;
input_str = '20240831';
input_num = input(input_str, yymmdd8.);
run;
Example of INFORMAT
:
data formatted;
infile datalines;
informat dob mmddyy10.;
input name $ dob;
datalines;
John 08/31/2024
Jane 12/15/2023
;
run;
11. How do you debug a SAS program?
Answer: There are several techniques to debug a SAS program:
- Check the SAS Log: Always review the log for error messages, warnings, and notes.
- Use
PUTLOG
Statements: InsertPUTLOG
statements in your Data Steps to monitor the values of variables and the flow of the program. - Use
OPTIONS
for Debugging Macros: UseMPRINT
,MLOGIC
, andSYMBOLGEN
options to trace macro execution. - Use
PROC SQL
with theDEBUG
Option: TheDEBUG
option inPROC SQL
can help trace SQL queries.
Example of PUTLOG
:
data _null_;
set sashelp.class;
putlog "Processing record: " name= age=;
run;
12. What is PROC TRANSPOSE
and how is it used?
Answer: PROC TRANSPOSE
is used to convert data from a long format to a wide format or vice versa. It changes the orientation of data by transposing rows to columns or columns to rows.
proc transpose data=sashelp.class out=transposed_class;
by name;
var age height weight;
run;
This example transposes the variables age
, height
, and weight
for each name
into separate columns in the output dataset.
13. How do you merge datasets in SAS when you have different key variables?
Answer: When merging datasets with different key variables, you can use PROC SQL
or Data Step with IF-THEN-ELSE
logic to align the keys before merging. Alternatively, you can rename variables before merging.
Example using PROC SQL
:
proc sql;
create table merged as
select a.*, b.variable
from dataset1 as a
left join dataset2 as b
on a.key1 = b.key2;
quit;
14. What are SAS formats and informats?
Answer:
- Formats: Define how data should be displayed in reports or output. For example,
DATE9.
displays a date as01JAN2024
. - Informats: Define how SAS reads raw data into a dataset. For example,
MMDDYY10.
reads a date value in the format08/31/2024
.
Example of using a format:
data format_example;
set sashelp.class;
format dob date9.;
dob = '31AUG2024'd;
run;
Example of using an informat:
data informat_example;
input name $ dob mmddyy10.;
format dob date9.;
datalines;
John 08/31/2024
Jane 12/15/2023
;
run;
15. How do you handle large datasets in SAS?
Answer: Handling large datasets in SAS involves optimizing both the code and the environment:
- Use indexing: Create indexes on variables that are frequently used in
WHERE
clauses to speed up data access. - Use the
KEEP
/DROP
options: Reduce the amount of data being processed by keeping only the necessary variables. - Use SQL
PASS-THROUGH
: When working with databases, use SQLPASS-THROUGH
to push processing to the database. - Use
PROC SQL
for joins: For large datasets,PROC SQL
may be more efficient than a Data Step merge.
Example using the KEEP
option:
data small_dataset;
set large_dataset(keep=var1 var2 var3);
run;
More SAS Programmer Interview Questions and Answers
16. What are the differences between PROC FREQ
and PROC TABULATE
?
Answer:
PROC FREQ:
Primarily used to generate frequency counts, cross-tabulations (contingency tables), and chi-square tests. It’s straightforward and mainly used for categorical data.PROC TABULATE:
More flexible and powerful thanPROC FREQ
. It can handle both categorical and continuous data, producing multi-dimensional tables and a variety of summary statistics.
Example of PROC FREQ
:
proc freq data=sashelp.class;
tables sex / chisq;
run;
Example of PROC TABULATE
:
proc tabulate data=sashelp.class;
class sex;
var age height weight;
table sex, (age height weight)*(mean std);
run;
17. Explain the difference between BY
statement and CLASS
statement in SAS procedures.
Answer:
BY Statement:
Used to process data in groups, but requires the data to be pre-sorted by theBY
variable(s). It splits the data into separate blocks for processing.CLASS Statement:
Used to specify categorical variables in procedures likePROC MEANS
,PROC GLM
, andPROC TABULATE
without needing to pre-sort the data. It internally groups the data and computes statistics for each level of theCLASS
variable.
Example using BY
statement:
proc sort data=sashelp.class; by sex; run;
proc means data=sashelp.class;
by sex;
var age;
run;
Example using CLASS
statement:
proc means data=sashelp.class;
class sex;
var age;
run;
18. What is the difference between SET
and MERGE
statements in a Data Step?
Answer:
SET Statement:
Used to read observations from one or more datasets sequentially. It concatenates datasets if multiple datasets are listed.MERGE Statement:
Combines observations from two or more datasets based on commonBY
variables. It performs a horizontal combination, merging data side-by-side.
Example using SET
statement:
data combined;
set dataset1 dataset2;
run;
Example using MERGE
statement:
data merged;
merge dataset1(in=a) dataset2(in=b);
by key_variable;
if a and b;
run;
19. How do you remove duplicate records in SAS?
Answer: You can remove duplicate records using the PROC SORT
procedure with the NODUPKEY
or NODUPRECS
option:
NODUPKEY:
Removes duplicates based on the specifiedBY
variables.NODUPRECS:
Removes records that are entirely duplicates across all variables.
Example using NODUPKEY
:
proc sort data=dataset out=unique nodupkey;
by key_variable;
run;
Example using NODUPRECS
:
proc sort data=dataset out=unique noduprecs;
run;
20. What is PROC REPORT
and how does it differ from PROC PRINT
?
Answer:
PROC PRINT:
A basic procedure used to print the contents of a dataset. It’s straightforward and doesn’t offer much flexibility in formatting.PROC REPORT:
A more advanced procedure that provides greater control over the layout and format of the output. It can summarize data, create complex tables, and apply customized formatting.
Example using PROC PRINT
:
proc print data=sashelp.class;
run;
Example using PROC REPORT
:
proc report data=sashelp.class nowd;
column name age height weight;
define name / display "Student Name";
define age / mean "Average Age";
run;
21. Explain the difference between PROC APPEND
and PROC SQL
for appending data.
Answer:
PROC APPEND:
Efficiently appends the data from one dataset (theDATA=
dataset) to the end of another (theBASE=
dataset) without reading and rewriting the entireBASE
dataset.PROC SQL:
Can also be used to append data via theINSERT INTO
statement, but this can be less efficient thanPROC APPEND
, especially for large datasets.
Example using PROC APPEND
:
proc append base=master data=new_data;
run;
Example using PROC SQL
:
proc sql;
insert into master
select * from new_data;
quit;
22. How would you check for missing values across all variables in a dataset?
Answer: You can check for missing values using the NMISS
and CMISS
functions in combination with PROC MEANS
or PROC FREQ
. Alternatively, a Data Step can be used to check for missing values.
Example using PROC MEANS
:
proc means data=dataset nmiss;
run;
Example using a Data Step:
data check_missing;
set dataset;
array vars{*} _numeric_;
array miss{dim(vars)};
do i = 1 to dim(vars);
if missing(vars{i}) then miss{i} = 1;
else miss{i} = 0;
end;
total_missing = sum(of miss{*});
run;
23. What is the use of the LAG
function in SAS?
Answer: The LAG
function in SAS returns the value of a variable from a previous row within the same Data Step iteration. It’s often used to calculate differences between consecutive observations or to identify patterns in time series data.
Example using LAG
:
data lag_example;
set sashelp.class;
prev_age = lag(age);
age_diff = age - prev_age;
run;
24. Explain how BY-GROUP
processing works in SAS.
Answer: BY-GROUP
processing is used when a dataset is sorted by one or more variables and you want to perform operations on subsets of data defined by those variables. SAS automatically processes each group separately when a BY
statement is used in Data Steps or procedures.
Example:
proc sort data=sashelp.class out=sorted_class;
by sex;
run;
data by_group;
set sorted_class;
by sex;
if first.sex then group_count = 0;
group_count + 1;
if last.sex then output;
run;
- FIRST.variable:
A temporary variable that is set to 1 when SAS processes the first observation in a BY group.
- LAST.variable:
A temporary variable that is set to 1 when SAS processes the last observation in a BY group.
25. What are the different ways to combine datasets in SAS?
Answer: Datasets in SAS can be combined using several methods:
- Concatenation (
SET
statement): Appends datasets vertically. - Merging (
MERGE
statement): Combines datasets horizontally based on key variables. - Interleaving (
SET
statement withBY
): Combines datasets in a sorted order based on BY variables. - Joining (
PROC SQL
): Combines datasets horizontally based on SQL JOIN conditions. - Appending (
PROC APPEND
): Adds new observations to an existing dataset.
Example using concatenation:
data combined;
set dataset1 dataset2;
run;
Example using merging:
data merged;
merge dataset1 dataset2;
by key_variable;
run;
26. How do you create a custom format in SAS?
Answer: You can create custom formats in SAS using PROC FORMAT
. Custom formats allow you to map data values to labels for reporting and display purposes.
Example of creating a custom format:
proc format;
value agegrp
low - 12 = 'Child'
13 - 19 = 'Teenager'
20 - high = 'Adult';
run;
data formatted;
set sashelp.class;
format age agegrp.;
run;
27. What are ARRAYS
in SAS, and how are they used?
Answer: An array in SAS is a temporary grouping of variables that allows you to perform the same operation on each element in the group using a loop. Arrays are useful for simplifying repetitive tasks.
Example of using an array:
data example;
set sashelp.class;
array scores{3} height weight age;
do i = 1 to 3;
scores{i} = scores{i} * 2;
end;
run;
28. What is the difference between COMPRESS=
and COMPRESS
function?
Answer:
COMPRESS=
option: A dataset option used to compress the storage of a dataset by eliminating redundancy, thus reducing its physical size on disk.COMPRESS
function: A function used to remove specified characters from a string.
Example using COMPRESS=
option:
data compressed_dataset(compress=yes);
set original_dataset;
run;
Example using COMPRESS
function:
data example;
input str $20.;
compressed_str = compress(str, 'aeiou');
datalines;
Hello World
SAS Programming
;
run;
29. Explain the PROC CONTENTS
procedure.
Answer: PROC CONTENTS
provides metadata about a SAS dataset, such as the variable names, types, lengths, labels, and the dataset’s creation date and engine. It is useful for understanding the structure of a dataset before analysis.
Example:
proc contents data=sashelp.class;
run;
30. How do you optimize the performance of a SAS program?
Answer: Performance optimization in SAS can be achieved through several techniques:
- Use indexing: Create indexes on frequently used variables to speed up data access.
- Use
KEEP
/DROP
statements: Reduce memory usage by keeping only the necessary variables. - Avoid unnecessary sorting: Sort only when necessary and use the
NODUPKEY
/NODUPRECS
options wisely. - Optimize I/O operations: Use compressed datasets, and avoid reading/writing the same dataset multiple times.
- Use
PROC SQL
efficiently: Optimize SQL queries by using appropriate join types and filtering as early as possible.
31. What is the difference between PROC MEANS
and PROC UNIVARIATE
?
Answer:
PROC MEANS:
Provides basic descriptive statistics like mean, median, standard deviation, min, and max. It’s used for summarizing numeric data.PROC UNIVARIATE:
Offers more detailed descriptive statistics, including percentiles, moments, tests for normality, and more detailed information about the distribution of data.
Example using PROC MEANS
:
proc means data=sashelp.class mean median stddev;
var height weight;
run;
Example using PROC UNIVARIATE
:
proc univariate data=sashelp.class;
var height weight;
histogram height;
qqplot height;
run;
32. How do you perform a left join in SAS?
Answer: You can perform a left join in SAS using either the Data Step with a MERGE
statement or PROC SQL
.
Example using Data Step:
data left_join;
merge dataset1(in=a) dataset2(in=b);
by key_variable;
if a;
run;
Example using PROC SQL
:
proc sql;
create table left_join as
select a.*, b.var2
from dataset1 as a
left join dataset2 as b
on a.key_variable = b.key_variable;
quit;
33. What is the purpose of the FIRST.
and LAST.
variables in SAS?
Answer: FIRST.
and LAST.
are temporary variables created by SAS during BY-GROUP
processing in a Data Step. They indicate whether an observation is the first or last in a BY
group, respectively.
Example:
data by_group;
set sashelp.class;
by sex;
if first.sex then group_count = 0;
group_count + 1;
if last.sex then output;
run;
34. What is the difference between DROP
and KEEP
statements in SAS?
Answer:
DROP Statement:
Excludes the specified variables from the output dataset.KEEP Statement:
Includes only the specified variables in the output dataset.
Example using DROP
:
data new_dataset;
set sashelp.class(drop=age);
run;
Example using KEEP
:
data new_dataset;
set sashelp.class(keep=name sex);
run;
35. How can you create a report with both summary statistics and detailed data in SAS?
Answer: You can use the ODS
(Output Delivery System) along with procedures like PROC MEANS
or PROC TABULATE
to generate summary statistics, and PROC PRINT
to display detailed data.
Example:
ods pdf file="report.pdf";
proc means data=sashelp.class mean median;
var height weight;
run;
proc print data=sashelp.class;
run;
ods pdf close;
36. Explain the difference between the INFILE
and INPUT
statements in SAS.
Answer:
INFILE Statement:
Specifies the location and attributes of an external file to be read by SAS.INPUT Statement:
Specifies how data should be read into SAS variables from the external file specified byINFILE
.
Example:
data mydata;
infile 'path-to-file/data.txt' dlm=',' dsd firstobs=2;
input name $ age height weight;
run;
37. How do you handle categorical variables in SAS?
Answer: Categorical variables can be handled using formats, PROC FREQ
, PROC TABULATE
, and CLASS
statements in various procedures.
Example using formats:
proc format;
value $gender 'M' = 'Male'
'F' = 'Female';
run;
data formatted;
set sashelp.class;
format sex $gender.;
run;
38. What is PROC CORR
, and when would you use it?
Answer: PROC CORR
is used to calculate correlation coefficients between variables. It’s useful for understanding the relationship between two or more numeric variables.
Example:
proc corr data=sashelp.class;
var height weight;
run;
39. Explain how to use the FILENAME
statement in SAS.
Answer: The FILENAME
statement assigns a fileref (file reference) to an external file, which can then be used in INFILE
, FILE
, or other I/O operations.
Example:
filename myfile 'C:\path\to\file.txt';
data _null_;
infile myfile;
input line $100.;
put line;
run;
filename myfile clear;
40. What is the difference between PROC FORMAT
and FORMAT
statement?
Answer:
PROC FORMAT:
Creates custom formats that can be used in theFORMAT
statement across multiple datasets and procedures.FORMAT Statement:
Applies a format to a variable in a specific procedure or Data Step.
Example of PROC FORMAT
:
proc format;
value agefmt low-12 = 'Child'
13-19 = 'Teenager'
20-high = 'Adult';
run;
proc print data=sashelp.class;
format age agefmt.;
run;
41. How do you concatenate character variables in SAS?
Answer: You can concatenate character variables using the concatenation operator (||
), the CATT
, CATX
, or CATS
functions depending on the desired outcome.
Example using ||
:
data concat;
set sashelp.class;
full_name = name || ' ' || sex;
run;
Example using CATX
:
data concat;
set sashelp.class;
full_name = catx(' ', name, sex);
run;
42. What is the use of IN=
, and how do you use it in merging datasets?
Answer: IN=
is used in Data Step merges to create a temporary variable that indicates whether an observation is present in a specific dataset. It is often used for conditional processing after merging datasets.
Example:
data merged;
merge dataset1(in=a) dataset2(in=b);
by key_variable;
if a and not b then flag = 'Only in dataset1';
else if b and not a then flag = 'Only in dataset2';
else if a and b then flag = 'In both datasets';
run;
43. How do you use ARRAY
and DO
loop together in SAS?
Answer: ARRAY
and DO
loops can be used together to apply the same operation across multiple variables in a dataset.
Example:
data example;
set sashelp.class;
array vars[3] height weight age;
do i = 1 to 3;
vars[i] = vars[i] * 2;
end;
run;
44. What is PROC LOGISTIC
, and when would you use it?
Answer: PROC LOGISTIC
is used for binary or multinomial logistic regression in SAS. It’s used when the dependent variable is categorical and you want to model the relationship between it and one or more independent variables.
Example:
proc logistic data=sashelp.class;
model sex(event='F') = height weight age;
run;
45. Explain the use of CALL SYMPUT
and CALL SYMPUTX
.
Answer:
CALL SYMPUT:
Assigns a value to a macro variable from within a Data Step.CALL SYMPUTX:
Similar toCALL SYMPUT
but removes leading and trailing blanks from the value being assigned to the macro variable.
Example of CALL SYMPUT
:
data _null_;
call symput('macro_var', 'value');
run;
%put ¯o_var.;
Example of CALL SYMPUTX
:
data _null_;
call symputx('macro_var', ' value ');
run;
%put ¯o_var.;
46. What is PROC GLM
, and how is it different from PROC REG
?
Answer:
PROC GLM:
Used for general linear models, including ANOVA, ANCOVA, and regression with more flexibility in model specification.PROC REG:
Specifically used for linear regression models, providing more diagnostic tools for regression analysis.
Example using PROC GLM
:
proc glm data=sashelp.class;
class sex;
model weight = height sex;
run;
Example using PROC REG
:
proc reg data=sashelp.class;
model weight = height;
run;
47. How do you remove trailing spaces from a character string in SAS?
Answer: You can remove trailing spaces using the STRIP
function or the TRIM
function.
Example using STRIP
:
data example;
str = "Hello ";
stripped_str = strip(str);
run;
Example using TRIM
:
data example;
str = "Hello ";
trimmed_str = trim(str);
run;
48. What is PROC TRANSREG
, and when would you use it?
Answer: PROC TRANSREG
is used for transformation and regression modeling. It allows for the analysis of data with complex transformations, including Box-Cox transformations, polynomial regression, and categorical data analysis.
Example:
proc transreg data=sashelp.class;
model identity(height) = spline(age);
run;
49. Explain the difference between PROC SORT
with OUT=
option and without it.
Answer:
- With
OUT=
Option: The sorted data is output to a new dataset specified by theOUT=
option, leaving the original dataset unchanged. - Without
OUT=
Option: The original dataset is sorted in place, and no new dataset is created.
Example with OUT=
:
proc sort data=sashelp.class out=sorted_class;
by age;
run;
Example without OUT=
:
proc sort data=sashelp.class;
by age;
run;
50. What is PROC MIXED
, and when would you use it?
Answer: PROC MIXED
is used for mixed-effects models, which are useful when dealing with data that have both fixed and random effects. It is often used in longitudinal data analysis or hierarchical data structures.
Example:
proc mixed data=sashelp.class;
class sex;
model weight = height / solution;
random sex;
run;