SAS Interview Questions: Base SAS
Very Basic:
What SAS statements would you code to read an external raw data file into a DATA step?
You would use the INFILE statement to specify the location of the external raw data file, and the INPUT statement to read in the data into the variables in a SAS dataset.
How do you read in the variables that you need?
You use the INPUT statement with specific column pointers, such as
Are you familiar with special input delimiters? How are they used?
Yes, special input delimiters like DLM and DSD are used to specify how fields are separated in the data. They are included in the INFILE statement. For example, the DLM option can be used to specify a comma as a delimiter for CSV files. The DSD option is used for comma-separated values (CSV) files and treats consecutive delimiters as missing values and ignores delimiters enclosed in quotation marks.
If reading a variable-length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?
You can use the MISSOVER option in the INFILE statement to prevent SAS from reading the next record when the last variable on the line doesn't have a value. If some data lines are shorter than expected, you can use the TRUNCOVER option to ensure that SAS reads the remaining part of the line without moving to the next record.
What is the difference between an informat and a format? Name three informats or formats.
- Informats are used to read data into SAS variables. They define how raw data is read into SAS variables. - Formats are used to write data or to control how the data appears in output. Examples of informats: -
Name and describe three SAS functions that you have used, if any.
1. LENGTH: Returns the length of a character string, excluding trailing blanks. - Example:
How would you code the criteria to restrict the output to be produced?
Use the WHERE statement or clause to filter the dataset based on specific criteria before output.
What is the purpose of the trailing @ and the @@? How would you use them?
- The single trailing @ holds the input line for continued processing across multiple INPUT statements. - The double trailing @@ holds the input line for continued reading across multiple iterations of the DATA step until the end of the input line is reached. Example of double trailing @@ when reading multiple observations per line:
Under what circumstances would you code a SELECT construct instead of IF statements?
You would use a SELECT statement when you have a long series of mutually exclusive conditions, especially when the conditions involve comparisons between numeric values. The SELECT construct can be more efficient than multiple IF-THEN/ELSE statements because it reduces CPU time. Example:
What statement do you code to tell SAS that it is to write to an external file?
Use the FILE statement to specify the external file, and the PUT statement to write data to that file.
If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?
Use the PUT _ALL_; statement or the PUT _INFILE_; statement to write the entire record without coding each variable individually.
If you're not wanting any SAS output from a DATA step, how would you code the DATA statement to prevent SAS from producing a dataset?
Use the DATA _NULL_; statement to execute a DATA step without creating an output dataset.
What is the one statement to set the criteria of data that can be coded in any step?
The WHERE statement can be used in any SAS procedure or DATA step to filter data based on specific criteria.
Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.
Yes, SAS code can be linked using macros or by including external files with the %INCLUDE statement, allowing for reusable code across multiple programs.
How would you include common or reuse code to be processed along with your statements?
Common or reusable code can be included in SAS programs using SAS Macros or the %INCLUDE statement.
When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: SCAN, INDEX, or INDEXC?
The INDEX function is best for locating a specific substring within a character string.
If you have a dataset that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables?
Use the KEEP statement or the KEEP= dataset option to select only the variables you need.
Code a PROC SORT on a dataset containing State, District, and County as the primary variables, along with several numeric variables.
How would you delete duplicate observations?
Use the NODUPRECS option in the PROC SORT procedure.
How would you delete observations with duplicate keys?
Use the NODUPKEY option in the PROC SORT procedure.
How would you code a merge that will keep only the observations that have matches from both sets?
Use an IF statement in the merge process to check for matches, for example:
How would you code a merge that will write the matches of both to one dataset, the non-matches from the left-most data?
Use IF statements to differentiate between matches and non-matches, outputting them to different datasets:
What is the Program Data Vector (PDV)? What are its functions?
The Program Data Vector (PDV) is an area in memory where SAS builds a dataset, one observation at a time. During the DATA step's execution, SAS reads data into the PDV, processes it, and then writes it to the output dataset. The PDV includes all variables in the DATA step, and two automatic variables: _N_, which counts the number of iterations, and _ERROR_, which flags data errors.
Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
SAS both compiles and interprets code. During the compilation phase, SAS translates the code into machine language and checks for syntax errors. During execution, SAS interprets the compiled code to perform the data processing.
At compile time when a SAS dataset is read, what items are created?
At compile time, SAS creates the Input Buffer, the PDV, and Descriptor Information, including the attributes of variables.
Name statements that are recognized at compile time only.
The PUT statement is recognized at compile time for the purpose of writing data to the log or to external files.
Name statements that are execution only.
The INFILE and INPUT statements are execution-only because they read data during the execution phase.
Identify statements whose placement in the DATA step is critical.
Statements such as DATA, INPUT, and RUN must be correctly placed in the DATA step to control the flow and execution of the program.
Name statements that function at both compile and execution time.
The INPUT statement functions at both compile time and execution time.
In the flow of DATA step processing, what is the first action in a typical DATA Step?
The first action in a DATA step is the creation of the DATA statement, which initializes the process. Each time the DATA statement executes, SAS increments the _N_ variable by 1.
What is _N_?
_N_ is an automatic variable in SAS that counts the number of iterations in the DATA step. It is particularly useful for subsetting data, controlling loops, or performing operations based on iteration counts. Example:
How do I convert a numeric variable to a character variable?
Use the PUT function to convert a numeric variable to a character variable.
How do I convert a character variable to a numeric variable?
Use the INPUT function to convert a character variable to a numeric variable.
How can I compute the age of something?
To compute the age, given two SAS date variables
How can I compute the number of months between two dates?
Given two SAS date variables
How can I determine the position of the nth word within a character string?
Use a combination of the INDEXW and SCAN functions:
I need to reorder characters within a string... should I use SUBSTR?
You can use the TRANSLATE function for simple reordering, which may be more efficient than using multiple SUBSTR calls. For example, to move the first character of a 4-character string to the last:
How can I format a SAS date variable so that December 25, 1995, appears as '19951225' (with no separator)?
You can use a combination of the YEAR and MMDDYY formats to display the value:
How can I format a SAS time variable with a leading zero for hours 1-9?
Use a combination of the Z and MMSS formats:
INFILE OPTIONS
The INFILE statement in SAS has several useful options to handle different types of external files: 1. FLOWOVER: This is the default option. If the INPUT statement reaches the end of a non-blank line without filling all variables, FLOWOVER reads the next line into the Input Buffer and attempts to fill the remaining variables starting from column one. 2. MISSOVER: Prevents SAS from moving to the next line if a line is shorter than expected. Instead, it assigns missing values to the remaining variables. 3. TRUNCOVER: This option is used to handle variable-length records. If the data line ends in the middle of a variable field, TRUNCOVER takes as much as is available, whereas MISSOVER would assign the variable a missing value. Example:
Different Ways to Use the PUT Statement
The PUT statement is versatile in SAS, used for writing data to the log, external files, or controlling the format of output data.
%INCLUDE Statement:
The %INCLUDE statement is used to include the contents of an external file or another SAS program into the current program. This is particularly useful for including common code snippets or reusable macro definitions across multiple programs.
What SAS statements would you code to read an external raw data file into a DATA step?
You would use the INFILE statement to specify the location of the external raw data file, and the INPUT statement to read in the data into the variables in a SAS dataset.
How do you read in the variables that you need?
You use the INPUT statement with specific column pointers, such as
@5
or 12-17
, to define where each variable is located in the raw data file.Are you familiar with special input delimiters? How are they used?
Yes, special input delimiters like DLM and DSD are used to specify how fields are separated in the data. They are included in the INFILE statement. For example, the DLM option can be used to specify a comma as a delimiter for CSV files. The DSD option is used for comma-separated values (CSV) files and treats consecutive delimiters as missing values and ignores delimiters enclosed in quotation marks.
If reading a variable-length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?
You can use the MISSOVER option in the INFILE statement to prevent SAS from reading the next record when the last variable on the line doesn't have a value. If some data lines are shorter than expected, you can use the TRUNCOVER option to ensure that SAS reads the remaining part of the line without moving to the next record.
What is the difference between an informat and a format? Name three informats or formats.
- Informats are used to read data into SAS variables. They define how raw data is read into SAS variables. - Formats are used to write data or to control how the data appears in output. Examples of informats: -
MMDDYYw.
: Reads dates in MM/DD/YY format.
- DATEw.
: Reads dates in various formats.
- COMMAw.
: Reads numeric values with commas.
Examples of formats:
- MMDDYY10.
: Displays dates as MM/DD/YYYY.
- DOLLAR12.2
: Displays numbers as currency with commas and two decimal places.
- WORDDATE18.
: Displays dates as fully spelled-out words.Name and describe three SAS functions that you have used, if any.
1. LENGTH: Returns the length of a character string, excluding trailing blanks. - Example:
data _null_;
a = 'my cat';
x = length(a);
put x=;
run;
- Result: x = 6
2. SUBSTR: Extracts a substring from a character variable starting at a specified position and of a specified length.
- Example:
data _null_;
a = '(916)734-6241';
x = substr(a, 2, 3);
put x=;
run;
- Result: x = '916'
3. TRIM: Removes trailing blanks from a character string.
- Example:
data _null_;
a = 'my ';
b = 'cat';
x = trim(a) || b;
put x=;
run;
- Result: x = 'mycat'
4. SUM: Returns the sum of non-missing values.
- Example:
data _null_;
x = sum(3, 5, .);
put x=;
run;
- Result: x = 8
5. INT: Returns the integer portion of a numeric value.
- Example:
data _null_;
x = int(5.67);
put x=;
run;
- Result: x = 5How would you code the criteria to restrict the output to be produced?
Use the WHERE statement or clause to filter the dataset based on specific criteria before output.
What is the purpose of the trailing @ and the @@? How would you use them?
- The single trailing @ holds the input line for continued processing across multiple INPUT statements. - The double trailing @@ holds the input line for continued reading across multiple iterations of the DATA step until the end of the input line is reached. Example of double trailing @@ when reading multiple observations per line:
data dsn;
input sex $ days @@;
datalines;
F 53 F 56 F 60 M 46 M 52 M 58
;
run;
Under what circumstances would you code a SELECT construct instead of IF statements?
You would use a SELECT statement when you have a long series of mutually exclusive conditions, especially when the conditions involve comparisons between numeric values. The SELECT construct can be more efficient than multiple IF-THEN/ELSE statements because it reduces CPU time. Example:
select;
when (condition1) do;
/* actions */
end;
when (condition2) do;
/* actions */
end;
otherwise do;
/* actions */
end;
end;
What statement do you code to tell SAS that it is to write to an external file?
Use the FILE statement to specify the external file, and the PUT statement to write data to that file.
If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?
Use the PUT _ALL_; statement or the PUT _INFILE_; statement to write the entire record without coding each variable individually.
If you're not wanting any SAS output from a DATA step, how would you code the DATA statement to prevent SAS from producing a dataset?
Use the DATA _NULL_; statement to execute a DATA step without creating an output dataset.
What is the one statement to set the criteria of data that can be coded in any step?
The WHERE statement can be used in any SAS procedure or DATA step to filter data based on specific criteria.
Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.
Yes, SAS code can be linked using macros or by including external files with the %INCLUDE statement, allowing for reusable code across multiple programs.
How would you include common or reuse code to be processed along with your statements?
Common or reusable code can be included in SAS programs using SAS Macros or the %INCLUDE statement.
When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: SCAN, INDEX, or INDEXC?
The INDEX function is best for locating a specific substring within a character string.
If you have a dataset that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables?
Use the KEEP statement or the KEEP= dataset option to select only the variables you need.
Code a PROC SORT on a dataset containing State, District, and County as the primary variables, along with several numeric variables.
proc sort data=one;
by State District County;
run;
How would you delete duplicate observations?
Use the NODUPRECS option in the PROC SORT procedure.
How would you delete observations with duplicate keys?
Use the NODUPKEY option in the PROC SORT procedure.
How would you code a merge that will keep only the observations that have matches from both sets?
Use an IF statement in the merge process to check for matches, for example:
data matched;
merge dataset1(in=a) dataset2(in=b);
by key;
if a and b;
run;
How would you code a merge that will write the matches of both to one dataset, the non-matches from the left-most data?
Use IF statements to differentiate between matches and non-matches, outputting them to different datasets:
data both_matches left_only;
merge dataset1(in=a) dataset2(in=b);
by key;
if a and b then output both_matches;
else if a and not b then output left_only;
run;
What is the Program Data Vector (PDV)? What are its functions?
The Program Data Vector (PDV) is an area in memory where SAS builds a dataset, one observation at a time. During the DATA step's execution, SAS reads data into the PDV, processes it, and then writes it to the output dataset. The PDV includes all variables in the DATA step, and two automatic variables: _N_, which counts the number of iterations, and _ERROR_, which flags data errors.
Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
SAS both compiles and interprets code. During the compilation phase, SAS translates the code into machine language and checks for syntax errors. During execution, SAS interprets the compiled code to perform the data processing.
At compile time when a SAS dataset is read, what items are created?
At compile time, SAS creates the Input Buffer, the PDV, and Descriptor Information, including the attributes of variables.
Name statements that are recognized at compile time only.
The PUT statement is recognized at compile time for the purpose of writing data to the log or to external files.
Name statements that are execution only.
The INFILE and INPUT statements are execution-only because they read data during the execution phase.
Identify statements whose placement in the DATA step is critical.
Statements such as DATA, INPUT, and RUN must be correctly placed in the DATA step to control the flow and execution of the program.
Name statements that function at both compile and execution time.
The INPUT statement functions at both compile time and execution time.
In the flow of DATA step processing, what is the first action in a typical DATA Step?
The first action in a DATA step is the creation of the DATA statement, which initializes the process. Each time the DATA statement executes, SAS increments the _N_ variable by 1.
What is _N_?
_N_ is an automatic variable in SAS that counts the number of iterations in the DATA step. It is particularly useful for subsetting data, controlling loops, or performing operations based on iteration counts. Example:
data new_dataset;
set old_dataset;
if mod(_n_, 3) = 1;
run;
This example selects every third record from the dataset.How do I convert a numeric variable to a character variable?
Use the PUT function to convert a numeric variable to a character variable.
How do I convert a character variable to a numeric variable?
Use the INPUT function to convert a character variable to a numeric variable.
How can I compute the age of something?
To compute the age, given two SAS date variables
birthdate
and calcdate
, you can use:
age = int(intck('month', birthdate, calcdate) / 12);
if month(birthdate) = month(calcdate) then age = age - (day(birthdate) > day(calcdate));
How can I compute the number of months between two dates?
Given two SAS date variables
startdate
and enddate
, you can compute the number of months between them as:
months = intck('month', startdate, enddate) - (day(enddate) < day(startdate));
How can I determine the position of the nth word within a character string?
Use a combination of the INDEXW and SCAN functions:
pos = indexw(string, scan(string, n));
I need to reorder characters within a string... should I use SUBSTR?
You can use the TRANSLATE function for simple reordering, which may be more efficient than using multiple SUBSTR calls. For example, to move the first character of a 4-character string to the last:
reorder = translate('2341', string, '1234');
How can I format a SAS date variable so that December 25, 1995, appears as '19951225' (with no separator)?
You can use a combination of the YEAR and MMDDYY formats to display the value:
put sasdate year4. sasdate mmddyy4.;
Or use the PUT and COMPRESS functions to store the value:
newvar = compress(put(sasdate, yymmdd10.), '/');
How can I format a SAS time variable with a leading zero for hours 1-9?
Use a combination of the Z and MMSS formats:
hrprint = hour(sastime);
put hrprint z2. ':' sastime mmss5.;
INFILE OPTIONS
The INFILE statement in SAS has several useful options to handle different types of external files: 1. FLOWOVER: This is the default option. If the INPUT statement reaches the end of a non-blank line without filling all variables, FLOWOVER reads the next line into the Input Buffer and attempts to fill the remaining variables starting from column one. 2. MISSOVER: Prevents SAS from moving to the next line if a line is shorter than expected. Instead, it assigns missing values to the remaining variables. 3. TRUNCOVER: This option is used to handle variable-length records. If the data line ends in the middle of a variable field, TRUNCOVER takes as much as is available, whereas MISSOVER would assign the variable a missing value. Example:
data trun;
infile 'external_file.txt' truncover;
input chr $3.;
run;
This would read up to 3 characters into the variable `chr`, even if the line is shorter than expected.Different Ways to Use the PUT Statement
The PUT statement is versatile in SAS, used for writing data to the log, external files, or controlling the format of output data.
%INCLUDE Statement:
The %INCLUDE statement is used to include the contents of an external file or another SAS program into the current program. This is particularly useful for including common code snippets or reusable macro definitions across multiple programs.
thank for your informations.resume format free download.we are eagerly learn the this in dba to the above articles.
ReplyDeleteThanks for posting these blog related to SAS.It will be really useful for my preparation. Keep posting such essential blogs.
ReplyDeleteSAS course