Monday, November 3, 2008
How to determine the last observation in a data set
Use the END= option on a SET statement to determine the last observation of the data set.
/* Create sample data */
data company;
input division :$12. employees;
datalines;
sales 150
support 200
research 250
accounting 50
shipping 35
; run;
/* Calculate the total number of employees in each group. */
/* On the last observation of the data set, write out the */
/* resulting total. */
data _null_;
set company end=last;
file print;
/* Sum statement syntax has an implied RETAIN */
total + employees;
/* For every iteration of the step, write out the values for */
/* DIVISION and EMPLOYEES. */
put @1 division @15 employees;
/* On the last iteration of the step only, write out 4 dashes */
/* starting at column 15, move the internal pointer to the next */
/* line and at column 15 write out the value of TOTAL. */
if last then put @15 '----' / @15 total;
run;
RESULT:
sales 150
support 200
research 250
accounting 50
shipping 35
----
source: http://support.sas.com/kb/24/746.html
SAS Clinical Interview QUESTIONS and ANSWERS
SAS Clinical Interview Questions and Answers
Here is a list of common SAS clinical interview questions along with example answers and explanations to help you prepare for your next interview.
1. What is SAS?
SAS stands for Statistical Analysis System. It is a software suite used for advanced analytics, business intelligence, data management, and predictive analytics. It is widely used in clinical trials for analyzing clinical data.
2. What are the phases of clinical trials?
The phases of clinical trials include:
- Phase I: Tests safety and dosage with a small group of healthy volunteers.
- Phase II: Tests efficacy and side effects with a larger group of patients.
- Phase III: Confirms effectiveness, monitors side effects, and compares with other treatments in larger patient groups.
- Phase IV: Conducts post-marketing studies to gather additional information on risks, benefits, and optimal use.
3. How would you import external data into SAS?
You can use the PROC IMPORT
statement to import external data files such as CSV, Excel, and text files into SAS datasets.
proc import datafile="path-to-file.csv"
out=dataset_name
dbms=csv
replace;
getnames=yes;
run;
4. What is the difference between informat and format in SAS?
An informat is used to read data into SAS variables, while a format is used to write or display data. Informats tell SAS how to interpret raw data values, and formats tell SAS how to display the data.
5. Explain the use of the PROC SORT
statement in SAS.
PROC SORT
is used to sort a dataset by one or more variables. Sorting data is often a prerequisite for procedures like merging datasets or generating summary statistics.
proc sort data=dataset_name;
by variable_name;
run;
6. How do you merge datasets in SAS?
You can merge datasets using the MERGE
statement in a DATA step, usually after sorting the datasets by the key variables.
data merged_data;
merge dataset1 (in=a) dataset2 (in=b);
by key_variable;
if a and b;
run;
7. What is the purpose of the IN
option in a merge?
The IN
option is used to create temporary variables that indicate whether a given observation was present in each dataset being merged. It is useful for controlling which observations to keep in the merged dataset.
8. How do you handle missing data in SAS?
In SAS, missing data for numeric variables is represented by a period (.) and for character variables by a blank space. You can use conditional logic to handle missing data.
data new_data;
set old_data;
if variable = . then variable = 0; /* Replace missing numeric values with 0 */
run;
9. How can you create a macro in SAS?
A macro in SAS is created using the %MACRO
statement, and it is executed using the %MEND
statement.
%macro example_macro;
data new_data;
set old_data;
/* Your code here */
run;
%mend example_macro;
%example_macro;
10. What is the difference between %LET
and CALL SYMPUT
in SAS?
%LET
is used to assign a value to a macro variable during macro execution, while CALL SYMPUT
is used within a DATA step to assign a value to a macro variable based on the result of an expression.
%let varname = value;
call symput('varname', value);
11. What is the use of the PUTLOG
statement in SAS?
The PUTLOG
statement is used to write custom messages to the SAS log. It is particularly useful for debugging purposes.
data _null_;
set dataset_name;
if variable = . then putlog 'Warning: Missing value for variable at observation ' _n_=;
run;
12. Explain the use of PROC SQL
in SAS.
PROC SQL
allows you to use SQL queries within SAS. It is useful for data manipulation and retrieval, especially when working with relational databases.
proc sql;
select variable1, variable2
from dataset_name
where condition;
quit;
13. How do you create a format in SAS?
You can create custom formats using the PROC FORMAT
statement.
proc format;
value agefmt
low - 18 = 'Child'
19 - 65 = 'Adult'
66 - high = 'Senior';
run;
14. How would you validate datasets and reports in SAS?
Validation in SAS typically involves double programming, PROC COMPARE, and review of logs and outputs to ensure the correctness of datasets and reports.
proc compare base=dataset1 compare=dataset2;
run;
15. Explain the use of PROC MEANS
.
PROC MEANS
is used to calculate summary statistics like mean, median, minimum, and maximum values for numeric variables.
proc means data=dataset_name;
var numeric_variable;
run;
16. How do you transpose data in SAS?
Data can be transposed using PROC TRANSPOSE
, which converts rows to columns and vice versa.
proc transpose data=dataset_name out=transposed_data;
by id_variable;
var variable_to_transpose;
run;
17. What is the use of ODS
in SAS?
ODS (Output Delivery System) is used to generate reports in various formats such as HTML, PDF, RTF, etc.
ods pdf file='report.pdf';
proc print data=dataset_name;
run;
ods pdf close;
18. How do you generate random numbers in SAS?
Random numbers can be generated using functions like RANUNI
or RAND
.
data random_numbers;
do i = 1 to 100;
random_value = rand('uniform');
output;
end;
run;
19. What is PROC LIFETEST
used for?
PROC LIFETEST
is used for survival analysis and estimating survival curves using methods like Kaplan-Meier.
proc lifetest data=survival_data method=km;
time time_variable*status_variable(0);
run;
20. Explain the use of the RETAIN
statement.
The RETAIN
statement in SAS is used to keep the value of a variable across iterations of the DATA step.
data retained_data;
set input_data;
retain count 0;
count + 1;
run;
21. How do you create a report in SAS?
Reports in SAS can be created using PROC REPORT
, PROC PRINT
, and DATA _NULL_
.
proc report data=dataset_name;
column variable1 variable2;
run;
22. What is the use of the ARRAY
statement in SAS?
Arrays in SAS are used to process a group of variables with a single statement or operation.
data array_example;
set dataset_name;
array scores(3) score1-score3;
do i = 1 to 3;
scores(i) = scores(i) * 10;
end;
run;
23. How would you subset data in SAS?
Subsetting in SAS can be done using the WHERE
or IF
statement within a DATA step or procedure.
data subset_data;
set dataset_name;
where age > 18;
run;
24. How do you use the DO
loop in SAS?
The DO
loop is used to execute a block of code multiple times.
data loop_example;
do i = 1 to 10;
output;
end;
run;
25. Explain the use of PROC TABULATE
.
PROC TABULATE
is used to create multi-dimensional tables and summaries.
proc tabulate data=dataset_name;
class group_variable;
var numeric_variable;
table group_variable, numeric_variable*(mean sum);
run;
26. What is the difference between PROC GLM
and PROC REG
?
PROC GLM
is used for fitting general linear models, while PROC REG
is specifically for linear regression models.
27. How do you create a temporary dataset in SAS?
A temporary dataset is created by default when the DATA
statement is used without a library reference. Temporary datasets are stored in the WORK library.
28. How do you remove duplicate records in SAS?
You can remove duplicate records using the NODUPKEY
or NODUP
options in PROC SORT
.
proc sort data=dataset_name nodupkey;
by key_variable;
run;
29. How do you format dates in SAS?
Dates in SAS can be formatted using the FORMAT
statement.
data formatted_dates;
set dataset_name;
format date_variable date9.;
run;
30. How do you calculate summary statistics in SAS?
Summary statistics can be calculated using PROC MEANS
, PROC SUMMARY
, or PROC TABULATE
.
proc means data=dataset_name;
var numeric_variable;
run;
31. How do you transpose datasets using the DATA step?
Data can be transposed using arrays in the DATA step.
data transposed_data;
set dataset_name;
array vars(*) var1-var3;
do i = 1 to dim(vars);
vars(i) = vars(i) * 10;
end;
run;
32. How do you concatenate datasets in SAS?
Datasets can be concatenated using the SET statement in a DATA step.
data combined_data;
set dataset1 dataset2;
run;
33. What is a BY-group processing in SAS?
BY-group processing allows you to perform operations on subsets of data that are grouped by one or more variables.
data grouped_data;
set dataset_name;
by group_variable;
run;
34. How do you generate descriptive statistics in SAS?
Descriptive statistics can be generated using PROC MEANS
, PROC SUMMARY
, or PROC UNIVARIATE
.
proc univariate data=dataset_name;
var numeric_variable;
run;
35. What is the use of CALL SYMPUTX
?
CALL SYMPUTX
is used to assign a value to a macro variable, removing leading and trailing blanks.
data _null_;
call symputx('varname', value);
run;
36. How do you create a histogram in SAS?
A histogram can be created using PROC SGPLOT
or PROC UNIVARIATE
.
proc sgplot data=dataset_name;
histogram numeric_variable;
run;
37. How do you filter data in SAS?
Data can be filtered using the WHERE
or IF
statements.
data filtered_data;
set dataset_name;
where age > 18;
run;
38. How do you use the FORMAT
statement in SAS?
The FORMAT
statement is used to apply formats to variables in a dataset.
data formatted_data;
set dataset_name;
format date_variable date9.;
run;
39. What is PROC CONTENTS
used for?
PROC CONTENTS
provides information about the contents of a dataset, such as variable names, types, and formats.
proc contents data=dataset_name;
run;
40. How do you use the IF-THEN/ELSE
statement in SAS?
The IF-THEN/ELSE
statement is used for conditional processing in a DATA step.
data conditional_data;
set dataset_name;
if age > 18 then adult = 1;
else adult = 0;
run;
41. How do you rename variables in SAS?
Variables can be renamed using the RENAME
statement in a DATA step.
data renamed_data;
set dataset_name(rename=(old_name=new_name));
run;
42. How do you calculate the median in SAS?
The median can be calculated using PROC MEANS
or PROC UNIVARIATE
.
proc means data=dataset_name median;
var numeric_variable;
run;
43. How do you create a macro variable in SAS?
A macro variable can be created using the %LET
statement.
%let varname = value;
44. How do you generate a PDF report in SAS?
A PDF report can be generated using the ODS PDF
statement.
ods pdf file='report.pdf';
proc print data=dataset_name;
run;
ods pdf close;
45. How do you read data from an Excel file in SAS?
Data from an Excel file can be read using PROC IMPORT
.
proc import datafile='file.xlsx'
out=dataset_name
dbms=xlsx
replace;
getnames=yes;
run;
46. How do you join tables in SAS?
Tables can be joined using the MERGE
statement in a DATA step or using SQL joins in PROC SQL
.
proc sql;
select a.*, b.*
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
47. How do you handle outliers in SAS?
Outliers can be handled by identifying them using PROC UNIVARIATE
or PROC MEANS
and then applying appropriate techniques such as capping, removing, or transforming them.
48. How do you calculate the difference between dates in SAS?
The difference between dates can be calculated using the INTCK
function.
data date_diff;
set dataset_name;
diff = intck('day', start_date, end_date);
run;
49. How do you create a frequency table in SAS?
A frequency table can be created using PROC FREQ
.
proc freq data=dataset_name;
tables categorical_variable;
run;
50. How do you save a permanent dataset in SAS?
A permanent dataset is saved by specifying a library other than WORK, typically assigned with a LIBNAME statement.
libname mylib 'C:\SASDatasets';
data mylib.dataset_name;
set work.dataset_name;
run;
51. How do you use the INFILE
statement to read raw data files?
The INFILE
statement is used in the DATA step to specify the location of an external raw data file.
data raw_data;
infile 'file-path' dlm=',' missover;
input variable1 $ variable2 $ variable3;
run;
52. How do you create a macro with positional parameters?
A macro with positional parameters allows you to pass values into the macro without specifying the parameter names.
%macro example_macro(param1, param2);
%put ¶m1 ¶m2;
%mend;
%example_macro(value1, value2);
53. How do you use CALL MISSING
in SAS?
CALL MISSING
is used to assign missing values to a list of variables.
data missing_data;
set input_data;
call missing(var1, var2, var3);
run;
54. How do you concatenate character strings in SAS?
Character strings can be concatenated using the CATS
, CATT
, CATX
, or CAT
functions.
data concatenated_data;
set input_data;
full_name = cats(first_name, ' ', last_name);
run;
55. How do you create and use a custom informat in SAS?
A custom informat is created using the PROC FORMAT
statement and can be used to read in specific data formats.
proc format;
invalue $genderfmt
'M' = 'Male'
'F' = 'Female';
run;
data formatted_data;
infile 'file-path';
input gender $genderfmt.;
run;
56. How do you transpose data using PROC TRANSPOSE
?
PROC TRANSPOSE
is used to transpose data from wide to long format or vice versa.
proc transpose data=input_data out=transposed_data;
by id_variable;
var variable1 variable2 variable3;
run;
57. How do you use the RETAIN
statement in SAS?
The RETAIN
statement is used to carry over the value of a variable from one iteration of the DATA step to the next.
data retained_data;
set input_data;
retain count 0;
count + 1;
run;
58. How do you generate descriptive statistics for categorical variables?
Descriptive statistics for categorical variables can be generated using PROC FREQ
.
proc freq data=input_data;
tables categorical_variable;
run;
59. How do you use IF-THEN/ELSE
logic for conditional processing?
IF-THEN/ELSE
logic is used to perform conditional operations in a DATA step.
data conditional_data;
set input_data;
if age >= 18 then adult = 'Yes';
else adult = 'No';
run;
60. How do you generate Kaplan-Meier survival estimates?
Kaplan-Meier survival estimates can be generated using PROC LIFETEST
.
proc lifetest data=survival_data;
time survival_time*censor(0);
run;
61. How do you perform a linear regression analysis in SAS?
Linear regression analysis can be performed using PROC REG
.
proc reg data=input_data;
model y_variable = x_variable1 x_variable2;
run;
62. How do you use the INPUT
statement to read data?
The INPUT
statement is used in the DATA step to specify the variables to be read from an external file.
data input_data;
infile 'file-path';
input var1 var2 var3;
run;
63. How do you export data to an Excel file in SAS?
Data can be exported to an Excel file using PROC EXPORT
.
proc export data=input_data
outfile='output-file.xlsx'
dbms=xlsx replace;
run;
64. How do you handle character variables with leading or trailing spaces?
Character variables with leading or trailing spaces can be handled using the STRIP
or TRIM
functions.
data cleaned_data;
set input_data;
cleaned_var = strip(original_var);
run;
65. How do you check for duplicate records in a dataset?
Duplicate records can be checked using PROC SORT
with the NODUPKEY
option.
proc sort data=input_data nodupkey;
by key_variable;
run;
66. How do you use the RENAME
statement to rename variables?
The RENAME
statement is used in a DATA step to change the names of variables.
data renamed_data;
set input_data(rename=(old_var=new_var));
run;
67. How do you calculate cumulative sums in SAS?
Cumulative sums can be calculated using the RETAIN
statement and a SUM
function.
data cumulative_sum;
set input_data;
retain cumulative 0;
cumulative + value;
run;
68. How do you use the MERGE
statement to combine datasets?
The MERGE
statement is used in a DATA step to combine two or more datasets by key variables.
data merged_data;
merge dataset1 dataset2;
by key_variable;
run;
69. How do you use the SUM
function in SAS?
The SUM
function is used to calculate the sum of non-missing values in a list of variables.
data summed_data;
set input_data;
total = sum(var1, var2, var3);
run;
70. How do you create a custom report using PROC REPORT
?
A custom report can be created using PROC REPORT
.
proc report data=input_data;
columns var1 var2 var3;
define var1 / group;
define var2 / sum;
define var3 / mean;
run;
Thursday, October 30, 2008
How to determine whether a numeric or character value exists within a group of variables
When trying to determine whether a specific value exists within a group of variables, a common approach is to associate the variables with an ARRAY and then use a DO loop to loop through every element or variable in the ARRAY. As an example,
here is a segment of code:
array my_array[*] var1 - var10;
do i = 1 to dim(my_array);
if some_value = my_array[i] then found = 'Yes';
end;
A more efficient alternative is to use the IN operator with the name of the ARRAY and avoid using the DO loop. This can be done with both numeric ARRAYS as well as character ARRAYS. Here is a code segment:
array my_array[*] var1 - var10;
if some_value IN my_array then found = 'Yes';
source: http://support.sas.com/kb/33/227.html
How to convert a SAS date to a character variable
Title: Convert a SAS date to a character variable *// *
*//* Goal: Use the PUT function to create a character variable from *//*
a SAS date. *//* *//***************************************************************************/
data one;
input sasdate :mmddyy6.;
datalines;
010199;
run;
data two;
set one;
chardate=put(sasdate,mmddyy6.);
run;
/* RESULTS */
Obs sasdate chardate
1 14245 010199
Source: ftp://ftp.sas.com/techsup/download/sample/datastep/convertchar.html
How to convert a character variable that represents a date into a SAS date
Use the INPUT function to convert a character value that represents a date into a SAS date value.
Data one;
input chardate1 :$6. chardate2 :$9. chardate3 $10. chardate4 :$9.;
datalines;
010199
31dec1999
21/09/2005
5/9/2005; Run;
/* Use the INPUT function to convert a character value that represents a date
*//* into a SAS date value. Choose the second parameter to the INPUT function
*//* based upon what the current character value looks like. Use a FORMAT
*//* statement to apply the date format you want when you are done. *//*
*//* Note: If you are in SAS 9.0 or above, you may prefer using the ANYDTDTEw.
*//* Informat as the second argument to the INPUT function. ANYDTDTEw.
*//* can read multiple date layouts. Refer to the SAS Language Reference,
*//* Dictionary under INFORMATS for more information. */
data two;
set one;
sasdate1=input(chardate1,mmddyy6.);
sasdate2=input(chardate2,date9.);
sasdate3=input(chardate3,ddmmyy10.);
sasdate4=input(chardate4,ddmmyy10.);
format sasdate1 mmddyy10. sasdate2 yymmdd10. sasdate3 date9. sasdate4 monyy7. ;
run;
proc print;
run;
RESULTS:
Obs chardate1 chardate2 chardate3 chardate4 sasdate1 sasdate2 sasdate3 sasdate4 1
01 0199 31dec1999 21/09/2005 5/9/2005 01/01/1999 1999-12-31 21SEP2005 SEP2005
source: http://support.sas.com/kb/24/591.html
Wednesday, October 29, 2008
LAG Function: How to obtain information from previous observation(s)
Using the LAG function to obtain information from previous observation(s)
**********************************************************;/* Sample 1: Create a single lag of one variable */
data one;
input x;
lagonce=lag(x);
datalines;
1
2
3
4
5
;
proc print data=one;
title 'Sample1: Single lag of one variable';
run;
***************************************************************;/* Sample 2: Create multiple lags of one variable */
data two;
input x;
lag1=lag(x);
lag2=lag2(x);
datalines;
1
2
3
4
5
;
proc print data=two;
title 'Sample 2: Multiple lags of one variable';
run;
***************************************************************;/* Sample 3: Create a single lag of one variable within a BY-Group */
/* See also: */
/* Sample 140: Obtaining the previous value of a variable within */a BY-Group */
/* Sample 108: Use the LAG function to conditionally carry */
/* information down a data set */
data three;
input group $ x;
datalines;
a 1
a 2
a 3
b 1
b 2
b 3
b 4
;
data final;
set three;
by group;
lagx=lag(x);
/* Note the LAG function is executed outside the IF condition. */
/* On the first member of the BY-Group, the variable created */
/* with the LAG function is reset to missing. */
if first.group then lagx=.;
run;
proc print data=final;
title 'Sample 3: Single lag of one variable within a BY-Group';
run;
RESULTS:
Sample1: Single lag of one variable
Obs x lagonce
1 1 .
2 2 1
3 3 2
4 4 3
5 5 4
Sample 2: Multiple lags of one variable
Obs x lag1 lag2
1 1 . .
2 2 1 .
3 3 2 1
4 4 3 2
5 5 4 3
Sample 3: Single lag of one variable within a BY-Group
Obs group x lagx
1 a 1 .
2 a 2 1
3 a 3 2
4 b 1 .
5 b 2 1
6 b 3 2
7 b 4 3
source: http://support.sas.com/kb/25/938.html
Without Using LAG Function:
*****************************************************************************;
Example2:
data lagcheck;
input a b ;
datalines;
1 1
. 2
. 3
. 4
. 5
2 6
. 7
. 8
3 9
. 10
. 11
. 12
. 13
. 14
;
run;
*Method1;
data lagcheck;
set lagcheck;
n=_n_;
if missing(a) then do;
do until (not missing(a));
n=n-1;
set lagcheck(keep=a) point=n;
end;
end;
run; * Note: Remember 2 Set statements;
**********************************************************;
*Method2;
data lagcheck;
set lagcheck;
retain lasta;
if not(missing(a)) then lasta=a;
if missing(a) then a=lasta;
drop lasta;
run;
***************************************************************;
* Here is another example given in SAS-L archives about Re: A Confusion about how to filling out empty cells with duplicates. and interesting solutiion using UPDATE Statement;
data have;
input Subject number1 number2;
infile datalines truncover;
datalines;
10001 212
10001 . 10
10002 555
10002
10002
10002 . 11
10003 11
10003
10003 . 12
10003
;;;;
run;
data need;
do _n_ = 1 by 1 until(last.subject);
update have(obs=0) have;
by subject;
end;
do _n_ = 1 to _n_;
output ;
end ;
run;
**********************************************************************;
Friday, October 24, 2008
IMPLEMENTATION OF CDISC STANDARDS
Presented By Sandeep Raj Juneja, ASG Inc....
CDISC accomplishments and Strategy
CDISC and Standards for Clinical Research
by Rebecca D.Kush, Ph.D, Founder & President,CDISC
CDISC SDTM and related initativies
CDISC submission standard : CDISC SDTM_Basics
Supporting The CDISC Standards
By Mark Lambrecht,PhD, Principal Consultant,Life Sciences,SAS
Case Report Tabulation Data Definition Specification (define.xml)
CDISC Study Data Tabulation Model SDTM Implementation Guide V3.1.1
http://www.cdisc.org/models/sdtm/v1.1/index.html
Clinical Data Integration:
SAS Clinical Data Integration
By Dave Smith, SAS UK
Industry Standards for the electronic submission of Data to the FDA
by Michael A.Walega
CDISC SDTM Basics
Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!
Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...
-
1) What do you know about CDISC and its standards? CDISC stands for Clinical Data Interchange Standards Consortium and it is developed ke...
-
Comparing Two Approaches to Removing Formats and Informats in SAS Comparing Two Approaches to Removing Formats...
-
USE OF THE “STUDY DAY” VARIABLES The permissible Study Day variables (--DY, --STDY, and --ENDY) describe the relative day of the observ...