Monday, December 22, 2008

12 Ways to save SAS data

12 Ways to save SAS data
source/direct link: http://wiki.binghamton.edu/index.php/12_Ways_to_save_SAS_data

1 Using CARDS; File Save (with editor window active)
1.1 To get it back

2 Cut and Paste (ex. Fixed format data)
2.1 To Get Them Back (space-delimited or fixed-formatted text)
2.2 Links

3 LIBNAME
3.1 To get it (them) back

4 Explorer with one LIBNAME statement?, menu/New Library button, Library Icon
4.1 To get it back

5 DATA ' '; direct access (including Unix ex.)
5.1 To get it back
5.2 Links:

6 In Excel Save As .XLS

7 In Excel Save As CSV (move to Unix, SSH)
7.1 To get it back(in SAS for Windows)
7.2 To Move (Upload) the CSV File from Windows to Unix
7.3 To get the data back (on Unix)
7.4 Links

8 In SPSS Save as
8.1 To get it (our data) back

9 In SAS Save as XPT file

10 XML

11 LIBNAME sasengine

12 DBF (to be written)

12.1 Saving a DBF (from Access?)
12.2 To get it back

13 13 ?PROC DATASOURCE?, ?Stata, ?S-Plus (to be written)

HOW TO CREATE A TRANSPORT FILE

CREATING A TRANSPORT FILE:




In SAS, how do I create a transport data set file?

In SAS, how do I create a transport data set file?

Source/direct link:http://kb.iu.edu/data/aevb.html

A SAS transport data set file is a machine-independent file that allows you to move a SAS data set from one operating system to another. A SAS transport data set file can also be read directly by several statistical software packages (e.g., SPSS, BMDP).

Following is an example of SAS code to copy the SAS data set file job1.sas7bdat to a SAS transport data set file portable.xpt in the outdata directory:

LIBNAME misc '~/work';
LIBNAME sasxpt XPORT '~/outdata/portable.xpt';


PROC COPY IN=misc OUT=sasxpt;

SELECT job1;
RUN;

In the example above:

The first LIBNAME statement aliases the library reference (libref) misc to the work directory.
The second LIBNAME statement aliases the libref sasxpt with the physical name of a SAS transport format file (in this case, portable.xpt in the outdata directory).

The COPY procedure copies one or more SAS data sets in the IN= libref (in this case, misc) to the OUT= libref (in this case, sasxpt).

The SELECT statement specifies that only the file job1.sas7bdat should be included in the transport file portable.xpt .

The file and pathnames in the above example follow Unix conventions. If you are using SAS for Windows, you should follow the appropriate filename and pathname conventions.

For example, in SAS for Windows, the two LIBNAME statements in the above example would instead be:

LIBNAME misc 'c:\work';
LIBNAME sasxpt XPORT 'c:\outdata\portable.xpt';




Getting the Data into SAS

GETTING DATA INTO SAS


1. Importing data from other sources
• Use DBMS Copy, STAT Transfer, or some other software that performs similar functions.
• Use SAS/ACCESS or the SAS Import feature in Windows.
• Use PROC IMPORTS to read certain types of data files.
• Create a raw data (ASCII) file then use INPUT and INFILE statements to read the raw data file in SAS programs.

2. Creating SAS data sets with raw (ASCII) data files
• Data are placed within the SAS program directly after the DATALINES or CARDS statement. Use INPUT statement to read the data.

*** Read data within SAS program as SAS data file ***;

DATA survey;
INPUT Q1Major Q2Degree Q3SASpast Q4ProjData Q5SASexprn Q6SASfutur;
DATALINES;
2 1 1 1 2 8
2 2 0 0 0 7
2 1 0 0 1 4
2 1 0 1 0 8
more raw data lines
;
RUN;

• In the above example,
– Spaces (or other types of delimiters) are required between data values
– Missing values must be specified (blank spaces are treated as separators)
– Character data values cannot have embedded spaces (e.g., names and addresses)
– Cannot skip unwanted data column when reading in data

Getting Data Into SAS
• Data are placed in an external ASCII file. Use INFILE statement to identify the location of the external file and Use INPUT statement to read the data.

*** Read text (ASCII) data file as SAS data file ***;
DATA TXT_Surv;
INFILE "C:\Class\M403B\M403B2005Lab1Surv.TXT" missover;
INPUT @1 Q1Major 3.
@6 Q1Spec $char10.
@17 Q2Degree 3.
@22 Q2Spec $char10.
@33 Q3SASpast 2.
@36 Q4ProjData 2.
@39 Q5SASexprn 3.
@45 Q6SASfutur 3.
;
RUN;
Or,

*** Read text (ASCII) data file as SAS data file ***;
DATA TXT_Surv;
INFILE "C:\Class\M403B\M403B2005Lab1Surv.TXT" missover;
INPUT Q1Major 1-3
Q1Spec $ 6-15
Q2Degree 17-20
Q2Spec $ 22-31
Q3SASpast 33-34
Q4ProjData 36-37
Q5SASexprn 39-41
Q6SASfutur 45-47
;
RUN;

• In the above two examples,
– Spaces (or other types of delimiters) are not required between data values
– Missing values can be left blank
– Character data values can have embedded spaces (e.g., names and addresses)
– Can skip unwanted data columns when reading data

Getting Data Into SAS
• Check the SAS log to make sure the data are read correctly into SAS (e.g., number of observations, length of each record, and number of variables, etc.).

• If the length of records is very long (> 256 characters), use option LRECL=XXX in the INFILE statement, where XXX is the length of the longest record in the ASCII data file.

• If one or more records have unassigned (blank) missing values at the end of the record, use option MISSOVER in the INFILE statement to prevent SAS from going to the next record prematurely. MISSOVER tells SAS to assign missing values to these variables before going to the next record.

• If each record has multiple lines, in the INPUT statement, use a forward slash (/) after variable names in each line, or use #2 before variable names on the second line (and #3 for the third line, etc.). Note: the same number of lines per record is required, and use blank line(s) if some records do not have data in the corresponding data line(s).

*** Read multiple lines per record ***;
DATA twoliner;
INFILEC:\rawdata\twolines.dat’;
INPUT id 1-8
sex 9-16
race 25-32 /
occ1 1-8
occ2 9-16
income 17-24
;
RUN;
Or,

*** Read multiple lines per record ***;
DATA twoliner;
INFILEC:\rawdata\twolines.dat’;
INPUT #1 id 1-8
sex 17-24
race 25-32
#2 occ1 1-8
occ2 9-16
income 17-24
;
RUN;

Getting Data Into SAS
• If ASCII file contains variable names or other information at the beginning of the file, use option FIRSTOBS=X in the INFILE statement to skip the first X line(s).

• The default delimiter in SAS is a blank space (‘ ‘). If other type of delimiter, such as comma (‘,’) is used to separate data values, need to specify option DLM=’,’ (or DELIMITER=’,’) in the INFILE statement. Use option DSD if consecutive delimiters indicate missing values in between. Note: for tab delimited data, use option EXPANDTABS in the INFILE statement.

• If some columns of data values are not need, skip those columns when reading the raw data file. For example, the following program reads variables ID, OCC1, and INCOME, while other variables are not included in the output SAS data.

*** Read multiple lines per record and skip unwanted variables ***;
DATA twoliner;
INFILEC:\rawdata\twolines.dat’;
INPUT #1 id 1-8
#2 occ1 1-8
income 17-24
;
RUN;

Getting Data Into SAS :

*** Define LIBREF MySASlib as a SAS library to store permanent SAS data sets ***;
LIBNAME MySASlib "C:\Class\M403B";

*** Create a permanent SAS data set Survey in LIBREF MySASlib in data step ***;
DATA MySASlib.Survey;
INFILE "C:\Class\M403B\M403B2005Lab1Surv.TXT" missover;
INPUT @1 Q1Major 3.
@6 Q1Spec $char10.
More input variables
;
RUN;

– MySASlib is the LIBREF, which refers to a subdirectory "C:\Class\M403B", and Survey is the name of the SAS data set, which can be found in the above subdirectory. Think of LIBREFs as pointers, where LIBREFs point to the location where permanent SAS data sets are located.

– More than one LIBNAME statements can be specified in one SAS program, if two or more permanent SAS data sets are created or used in different subdirectories.
• Identifying permanent SAS data sets

– Identify the permanent SAS data set by using an appropriate LIBREF, defined with a LIBNAME statement previously.

*** Examine the contents of a permanent SAS data set Survey located in LIBREF MySASlib using PROC CONTENTS ***;

PROC CONTENTS DATA=MySASlib.Survey Position;
RUN;


– The PROC CONTENTS option POSITION produces output with variables listed in the order in which they appear in the data set (by position), as well as alphabetically (the default).

Getting Data Into SAS

4. Exporting data from SAS data sets
• Use DBMS Copy, STAT Transfer, or some other software that performs similar functions.

• Use SAS/ACCESS or the SAS Export feature in Windows.

• Use PROC EXPORTS to write certain types of data files.

• Create raw data (ASCII) files using PUT and FILE statements in SAS program.

*** Define LIBREF M403 as a SAS library to store permanent SAS data sets ***;
LIBNAME M403 "C:\Class\M403B";
*** Write a text (ASCII) data file from a SAS data set ***;
DATA _NULL_;
SET M403.Survey
FILE "C:\Class\M403B\M403B_SurveyData.TXT" ;
PUT @1 Q1Major 3.
@6 Q1Spec $char10.
@17 Q2Degree 3.
@22 Q2Spec $char10.
@33 Q3SASpast 2.
@36 Q4ProjData 2.
@39 Q5SASexprn 3.
@45 Q6SASfutur 3.
;
RUN;

– In the above example, an ASCII file M403B_SurveyData.TXT is created and stored in a subdirectory "C:\Class\M403B" using a permanent SAS data set Survey stored in a SAS library referred by M403, which points to the subdirectory "C:\Class\M403B".

– If column numbers had not been specified, SAS would place a space between each variable value in the ASCII file.

– The SAS special data set _NULL_ is used because a SAS data set is not being created in this data step (_NULL_ tells SAS to not go to the trouble of building a SAS data set).

– It is not necessary to identify character variables with the $ sign because SAS already knows which variables are character variables.

Getting Data Into SAS
5. Transporting SAS data sets from one type of host (e.g., Unix) to another (e.g., Windows)

• In SAS Version 6 and lower, if a SAS data set is created in one type of host (e.g. Unix) and is used in another (e.g. Windows), a SAS transport file needs to be created first. This is not necessary in SAS Version 8 or higher. Using PROC COPY, the following example creates a SAS transport file called SAS_Surv.XPT from the SAS data set Survey located in the C:\Class\M403B subdirectory:

*** Define LIBREF M403 as a SAS library to store permanent SAS data sets ***;
LIBNAME M403 "C:\Class\M403B";

*** Define LIBREF XPT as a SAS transport file to be used in different host systems ***;
LIBNAME XPT XPORT "C:\Class\M403B\SAS_Surv.XPT";

*** Define LIBREF XPT as a SAS transport file to be used in different host systems ***;
PROC COPY IN=M403 OUT=XPT;
SELECT Survey;
RUN;

– XPORT is a SAS keyword that tells SAS to create a transport file.

– M403 is the LIBREF pointing to the location of the SAS data set; and XPT is the LIBREF referring to the SAS transport file.

– IN, OUT, and SELECT are SAS keywords specifying the input data set location (SAS data set), output data file (SAS transport file), and the name of the SAS data set from which the transport data set will be made.

• After the SAS transport file is created, SAS data set(s) contained in the transport file can be accessed directly in the "new" host. The following example creates a SAS dataset Survey from a SAS transport file SAS_Surv.XPT, which is located in C:\Class\M403B.

PROC COPY IN=XPT OUT=M403;
RUN;
DATA M403.Survey;
SET XPT.Survey;
RUN;

Friday, November 21, 2008

How to detect missing values using ARRAYS

Using an array in SAS to detect missing values

Direct link: http://ssc.utexas.edu/consulting/answers/sas/sas65.html

Question:
How do I exclude observations from my PROC FREQ analysis when a value is missing from a list of variables?

Answer:
In the SAS DATA step, you can create a new variable ("miss" in the example below) that is set equal to 1 when a variable has a missing value, 0 otherwise. Use the ARRAY statement and a DO loop to check for missing values across a list of variables; the syntax is:

DATA one ;
INFILE xxx;
INPUT a b c d e;
miss=0;
ARRAY vv(5) a b c d e ;
DO i=1 TO 5 ;
IF vv(i)=. THEN DO;
miss=1 ;
i=5;

END;
END;
RUN;


PROC FREQ;
WHERE miss =0;
TABLES a b c d e ;
RUN ;

Here, the array "vv" has 5 elements (a,b,c,d,e), and the loop "i" is likewise set to 5.

For each observation, the loop iterates 5 times, checking for missing values across the list of 5 variables. When a missing value is encountered, the variable "miss" is set to 1 and the loop stopped for that observation.

"Miss" was initially set to zero, and it is only changed if an observation has missing data on any of the five variables. The PROC FREQ then uses the WHERE statement to restrict processing to observations having "miss" set to zero.
Want to know more about Missing Values...

Wednesday, November 19, 2008

SAS System Options in the UNIX environment

SAS System Options in the UNIX environment

Direct link:http://ssc.utexas.edu/consulting/answers/sas/sas55.html

Question:
How do I specify SAS system options in the UNIX environment?

Answer:
How you specify SAS system options depends on how you use SAS in the UNIX environment.
If you use SAS via an X-terminal or X-terminal emulation software such as Exodus or MacX, the command to launch SAS on ITS UNIX systems is
/usr/local/sas/sas

SAS system options are preceded by a hyphen and immediately follow the SAS command. For example, if you want to have SAS write its work files (including temporary datasets) to a directory called "mysasdir" located one level below your own current working directory, the syntax for invoking this option would be:

/usr/local/sas/sas -work ./mysasdir

If you use the SAS display manager system via a vt100 terminal interface such as telnet, the usual command to launch SAS is:
/usr/local/sas/sas -fsd ascii.vt100

The -fsd portion of this command is a SAS option which means "full screen device". You could add another SAS system option to this command, such as the -work option mentioned above:
/usr/local/sas/sas -fsd ascii.vt100 -work ./mysasdir

If you run SAS noninteractively by supplying the SAS program file "mysasprog" in the current directory, you would change your command from:
/usr/local/sas/sas mysasprog
to:
/usr/local/sas/sas -work ./mysasdir mysasprog

This would direct SAS to write any work files to the "mysasdir" directory as it processes the contents of your SAS program "mysasprog".

Each of these examples assumes you want to use a particular SAS system option once or a few times. If you intend to use a SAS system option repeatedly, it can be a nuisance to specify the same option each time you invoke SAS. For this type of situation, you can copy the config.sas612 file located in the /usr/local/sas/sas612 directory and edit it using a UNIX text editor.

The config.sas file is a SAS configuration file that contains default settings for a number of SAS system options; you may change the default settings of these options as well as delete or add options of your choosing to this file. If you then launch SAS from the same directory as the config.sas612 file, SAS will use those options.

How to Debug the SAS code

Debugging SAS code

Direct link: http://ssc.utexas.edu/consulting/answers/sas/sas54.html

Question:
I have a huge SAS program that isn't working. The results I get are not right but there are no errors or warnings in the SAS log. How can I figure out where I went wrong?

Answer:
To debug a SAS program that produces no syntax errors, follow these six steps:

1. Check to see that your original data input is correct for all variables.

2. If the data is input to SAS correctly, go to the other end of the program. Select a variable or a small set of variables involved in the analyses where you get the wrong results. Use PROC FREQ, PROC MEANS, and/or PROC PRINT to examine these variables. There should be a problem with at least one; identify exactly how these variables are incorrect.

3. Now follow these variables back through each operation you performed, always looking at the characteristics in question. In this way you can narrow down the exact step where an error occurs. Prior to the questionable step, the variable characteristics will be appropriate; after the step they will be inappropriate.

4. Look carefully at the code for that step. Continue using PROC PRINT, PROC FREQ, and PROC MEANS to examine the effect of each statement. In this way, you can identify the exact statement or statement group that is not working as you expect.

5. Next, get a clear understanding of how the statement is working (as opposed to how you think it should work) by consulting the SAS Help function; click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation; then, search for the particular statement or procedure. The results in hand should help you interpret the documentation.

6. Finally, determine the appropriate code for your needs. Remember to check for other statements that involve this mistake.

How to call SAS macros on UNIX

SAS FAQ #64: Calling SAS macros on UNIX systems

source: http://ssc.utexas.edu/consulting/answers/sas/sas64.html

Question:
On UNIX how do I call an external SAS MACRO that is in another file and not physically included in my SAS program?
Answer:
Put the SAS MACRO in a file having a name "macro_name.sas", where "macro_name" is the macro name from the SAS %MACRO statement.
When you want to use this macro in a SAS program, perform the following steps:

1) Use a FILENAME statement of the form:
FILENAME wheremac 'dir';
where "wheremac" is a fileref pointing to the directory where the macro file is stored, and "dir" is the directory path to where the macro file is stored.

2) Use an OPTIONS statement of the form:
OPTIONS SASAUTOS=wheremac;
to point to the directory containing the macro file.

3) Use a call for your macro as you normally would with the "%macroname" specification, where "macroname" is the first name of the macro file (and the name of the macro in the %MACRO statement).
For example if you had a macro to do a PROC PRINT stored in a file called "prt.sas", it could look like this:

%MACRO prt;
PROC PRINT;
RUN;

%MEND;

If you stored this file in your $HOME directory, you could run the SAS job:

FILENAME wheremac '$HOME/';
DATA one;
INPUT a b ;
CARDS;
1 23 2;
OPTIONS SASAUTOS=wheremac;
%prt
The above code will create the data set and print the data to the SAS listing file.

Friday, November 7, 2008

SAS Certification Assignments

SAS Certification Assignments Contents[hide]
1 Session 1
1.1 exercise 1
1.2 exercise 2
1.3 exercise 3
1.4 exercise 4
1.5 exercise 5
1.6 exercise 6
2 Session 2
2.1 exercise 1
2.2 exercise 2
2.3 exercise 3
2.4 exercise 4
2.5 exercise 5
3 Session 3
3.1 Exercise 1
3.2 Exercise 2
3.3 Exercise 3
3.4 Exercise 4
4 Session 4
4.1 Exercise 1
4.2 Exercise 2
4.3 Exercise 3
5 Session 5
5.1 exercise 1
5.2 exercise 2
5.3 exercise 3
5.4 exercise 4
5.5 exercise 5
6 Session 6
7 Session 7
7.1 exercise 1
7.2 exercise 2
7.3 exercise 3
8 Session 8
8.1 exercise 1
9 Session 9
9.1 exercise 1

SAS Certification Examples(part 2)
Session 6
1.1 Example 1
1.2 Example 2
1.3 Example 3
1.4 Example 4
1.5 Example 4a
1.6 Example 5
1.7 Closing examples
2 Session 7
2.1 In class
2.2 Some similar functions,formats and informats
3 Session 8
4 Session 9
4.1 Example 1
4.2 Example 2
4.3 Example 3
4.4 Example 3A
4.5 Example 4
5 Session 10
5.1 Example 1
6 Example 2
7 Examples for Model Procedure
7.1 A second example for parabolic curve


Answers to SAS Certification Assignments
C
Certification Summary---Summary Reports
Class room for the workshop series
Concise Glossary for SAS

L
Lecture Notes 9/12/2006
Lecture Notes 9/5/2006

S
SAS Certification Assignments
SAS Certification Essential Reading SAS Certification Examples

S cont.
SAS Certification Examples and Exercises
SAS Certification Examples(part 2)
SAS Lesson Plan for SAS Certification

U
User:Ywang10/Work

source: http://wiki.binghamton.edu/index.php/SAS_Certification_Assignments

Wednesday, November 5, 2008

INDEX TO SAS TUTORIALS

Index to Statistics Tutorials

PROC MEANS Tutorial (Descriptive statistics)
PROC UNIVARIATE Tutorial (Distribution analysis)
New:
PROC UNIVARIATE - Advanced Tutorial
PROC CORR Tutorial (Correlation)
PROC FREQ Tutorial 1 (Frequency Tables/Goodness of Fit)
PROC FREQ Tutorial 2 (Two-way tables)
PROC TTEST Tutorial (Two sample and paired t-tests)
New:
A comparison of Paired & Independent Sample t-tests
PROC ANOVA & GLM Tutorial (One-Way ANOVA)
PROC GLM Tutorial (Repeated measures ANOVA using PROC GLM)
New:
Survival Analysis & comparison of groups using PROC LIFEREG
Bland-Altman Analysis (Comparing two measures)
Inter-Rater Reliability, Kappa, Weighted Kappa (PROC FREQ)
New:
SAS Functions (2-part tutorial)
Special SAS Topics
New:
Read and Write SAS Data Sets the EASY way
Setting the SAS Initial Folder (default directory)
Using SAS ODS Output, Styles, Graphics, Data Data files and SAS code for tutorials

FIRST. and LAST. variables: Data step processing within by groups using the SET statement

FIRST. and LAST. variables: Data step processing within by groups
If you use a by statement along with a set statement in a data step then SAS creates two automatic variables, FIRST.variable and LAST.variable, where variable is the name of the by variable. FIRST.variable has a value 1 for the first observation in the by group and 0 for all other observations in the by group. LAST.variable has a value 1 for the last observation in the by group and 0 for all other observations in the by group.
The code shown below is available here.

data temp;
input group x;
cards;
1 23
1 34
1 .
1 45
2 78
2 92
2 45
2 89
2 34
2 76
3 31
4 23
4 12
;
run;
/**************************************************
The automatic variables first.group and last.group
are not saved with the data set. Here we write them
to data set variables to show their contents.
**************************************************/
data new;
set temp;
by group;
first=first.group;
last=last.group;
run;
proc print;
title 'Raw data along with first.group and last.group';
run;
/**************************************************
A common task in data cleaning is to identify
observations with a duplicate ID number. If we set
the data set by ID, then the observations which
are not duplicated will be both the first and the
last with that ID number. We can therefore write
any observations which are not both
first.id and
last.id to a separate data set and examine them.
**************************************************/
data single dup;
set temp;
by group;
if first.group and last.group then output single;
else output dup;
run;
/**************************************************
We may also want to do data set processing within
each by group. In this example we construct the
cumulative sum of the variable X within each group.
**************************************************/
data cusum(keep=group sum);
set temp;
by group;
if first.group then sum=0;
sum+x;
if last.group then output;
run;
proc print data=cusum noobs;
title 'Sum of X within each group';
run;
/**************************************************
As an aside, if you simply want the sum of X within
each group, one of the many way of obtaining this
is with PROC PRINT.
**************************************************/
proc print data=temp;
title 'All data with X summed within each group';
by group;
sum x;
sumby group;
run;

source: http://www.pauldickman.com/teaching/sas/set_by.php

Monday, November 3, 2008

How to determine the last observation in a data set

Determine the last observation in a data set

Use the END= option on a SET statement to determine the last observation of the data set.

/* Create sample data */
data company;
input division :$12. employees;
datalines;
sales 150
support 200
research 250
accounting 50
shipping 35
; run;

/* Calculate the total number of employees in each group. */
/* On the last observation of the data set, write out the */
/* resulting total. */

data _null_;
set company end=last;
file print;
/* Sum statement syntax has an implied RETAIN */
total + employees;
/* For every iteration of the step, write out the values for */
/* DIVISION and EMPLOYEES. */
put @1 division @15 employees;
/* On the last iteration of the step only, write out 4 dashes */
/* starting at column 15, move the internal pointer to the next */
/* line and at column 15 write out the value of TOTAL. */

if last then put @15 '----' / @15 total;
run;

RESULT:
sales 150
support 200
research 250
accounting 50
shipping 35
----
source: http://support.sas.com/kb/24/746.html

SAS Clinical Interview QUESTIONS and ANSWERS

SAS Clinical Interview Questions and Answers

SAS Clinical Interview Questions and Answers

Here is a list of common SAS clinical interview questions along with example answers and explanations to help you prepare for your next interview.

1. What is SAS?

SAS stands for Statistical Analysis System. It is a software suite used for advanced analytics, business intelligence, data management, and predictive analytics. It is widely used in clinical trials for analyzing clinical data.

2. What are the phases of clinical trials?

The phases of clinical trials include:

  • Phase I: Tests safety and dosage with a small group of healthy volunteers.
  • Phase II: Tests efficacy and side effects with a larger group of patients.
  • Phase III: Confirms effectiveness, monitors side effects, and compares with other treatments in larger patient groups.
  • Phase IV: Conducts post-marketing studies to gather additional information on risks, benefits, and optimal use.

3. How would you import external data into SAS?

You can use the PROC IMPORT statement to import external data files such as CSV, Excel, and text files into SAS datasets.

proc import datafile="path-to-file.csv" 
    out=dataset_name 
    dbms=csv 
    replace;
    getnames=yes;
run;

4. What is the difference between informat and format in SAS?

An informat is used to read data into SAS variables, while a format is used to write or display data. Informats tell SAS how to interpret raw data values, and formats tell SAS how to display the data.

5. Explain the use of the PROC SORT statement in SAS.

PROC SORT is used to sort a dataset by one or more variables. Sorting data is often a prerequisite for procedures like merging datasets or generating summary statistics.

proc sort data=dataset_name;
    by variable_name;
run;

6. How do you merge datasets in SAS?

You can merge datasets using the MERGE statement in a DATA step, usually after sorting the datasets by the key variables.

data merged_data;
    merge dataset1 (in=a) dataset2 (in=b);
    by key_variable;
    if a and b;
run;

7. What is the purpose of the IN option in a merge?

The IN option is used to create temporary variables that indicate whether a given observation was present in each dataset being merged. It is useful for controlling which observations to keep in the merged dataset.

8. How do you handle missing data in SAS?

In SAS, missing data for numeric variables is represented by a period (.) and for character variables by a blank space. You can use conditional logic to handle missing data.

data new_data;
    set old_data;
    if variable = . then variable = 0;  /* Replace missing numeric values with 0 */
run;

9. How can you create a macro in SAS?

A macro in SAS is created using the %MACRO statement, and it is executed using the %MEND statement.

%macro example_macro;
    data new_data;
        set old_data;
        /* Your code here */
    run;
%mend example_macro;
%example_macro;

10. What is the difference between %LET and CALL SYMPUT in SAS?

%LET is used to assign a value to a macro variable during macro execution, while CALL SYMPUT is used within a DATA step to assign a value to a macro variable based on the result of an expression.

%let varname = value;
call symput('varname', value);

11. What is the use of the PUTLOG statement in SAS?

The PUTLOG statement is used to write custom messages to the SAS log. It is particularly useful for debugging purposes.

data _null_;
    set dataset_name;
    if variable = . then putlog 'Warning: Missing value for variable at observation ' _n_=;
run;

12. Explain the use of PROC SQL in SAS.

PROC SQL allows you to use SQL queries within SAS. It is useful for data manipulation and retrieval, especially when working with relational databases.

proc sql;
    select variable1, variable2
    from dataset_name
    where condition;
quit;

13. How do you create a format in SAS?

You can create custom formats using the PROC FORMAT statement.

proc format;
    value agefmt
        low - 18 = 'Child'
        19 - 65 = 'Adult'
        66 - high = 'Senior';
run;

14. How would you validate datasets and reports in SAS?

Validation in SAS typically involves double programming, PROC COMPARE, and review of logs and outputs to ensure the correctness of datasets and reports.

proc compare base=dataset1 compare=dataset2;
run;

15. Explain the use of PROC MEANS.

PROC MEANS is used to calculate summary statistics like mean, median, minimum, and maximum values for numeric variables.

proc means data=dataset_name;
    var numeric_variable;
run;

16. How do you transpose data in SAS?

Data can be transposed using PROC TRANSPOSE, which converts rows to columns and vice versa.

proc transpose data=dataset_name out=transposed_data;
    by id_variable;
    var variable_to_transpose;
run;

17. What is the use of ODS in SAS?

ODS (Output Delivery System) is used to generate reports in various formats such as HTML, PDF, RTF, etc.

ods pdf file='report.pdf';
proc print data=dataset_name;
run;
ods pdf close;

18. How do you generate random numbers in SAS?

Random numbers can be generated using functions like RANUNI or RAND.

data random_numbers;
    do i = 1 to 100;
        random_value = rand('uniform');
        output;
    end;
run;

19. What is PROC LIFETEST used for?

PROC LIFETEST is used for survival analysis and estimating survival curves using methods like Kaplan-Meier.

proc lifetest data=survival_data method=km;
    time time_variable*status_variable(0);
run;

20. Explain the use of the RETAIN statement.

The RETAIN statement in SAS is used to keep the value of a variable across iterations of the DATA step.

data retained_data;
    set input_data;
    retain count 0;
    count + 1;
run;

21. How do you create a report in SAS?

Reports in SAS can be created using PROC REPORT, PROC PRINT, and DATA _NULL_.

proc report data=dataset_name;
    column variable1 variable2;
run;

22. What is the use of the ARRAY statement in SAS?

Arrays in SAS are used to process a group of variables with a single statement or operation.

data array_example;
    set dataset_name;
    array scores(3) score1-score3;
    do i = 1 to 3;
        scores(i) = scores(i) * 10;
    end;
run;

23. How would you subset data in SAS?

Subsetting in SAS can be done using the WHERE or IF statement within a DATA step or procedure.

data subset_data;
    set dataset_name;
    where age > 18;
run;

24. How do you use the DO loop in SAS?

The DO loop is used to execute a block of code multiple times.

data loop_example;
    do i = 1 to 10;
        output;
    end;
run;

25. Explain the use of PROC TABULATE.

PROC TABULATE is used to create multi-dimensional tables and summaries.

proc tabulate data=dataset_name;
    class group_variable;
    var numeric_variable;
    table group_variable, numeric_variable*(mean sum);
run;

26. What is the difference between PROC GLM and PROC REG?

PROC GLM is used for fitting general linear models, while PROC REG is specifically for linear regression models.

27. How do you create a temporary dataset in SAS?

A temporary dataset is created by default when the DATA statement is used without a library reference. Temporary datasets are stored in the WORK library.

28. How do you remove duplicate records in SAS?

You can remove duplicate records using the NODUPKEY or NODUP options in PROC SORT.

proc sort data=dataset_name nodupkey;
    by key_variable;
run;

29. How do you format dates in SAS?

Dates in SAS can be formatted using the FORMAT statement.

data formatted_dates;
    set dataset_name;
    format date_variable date9.;
run;

30. How do you calculate summary statistics in SAS?

Summary statistics can be calculated using PROC MEANS, PROC SUMMARY, or PROC TABULATE.

proc means data=dataset_name;
    var numeric_variable;
run;

31. How do you transpose datasets using the DATA step?

Data can be transposed using arrays in the DATA step.

data transposed_data;
    set dataset_name;
    array vars(*) var1-var3;
    do i = 1 to dim(vars);
        vars(i) = vars(i) * 10;
    end;
run;

32. How do you concatenate datasets in SAS?

Datasets can be concatenated using the SET statement in a DATA step.

data combined_data;
    set dataset1 dataset2;
run;

33. What is a BY-group processing in SAS?

BY-group processing allows you to perform operations on subsets of data that are grouped by one or more variables.

data grouped_data;
    set dataset_name;
    by group_variable;
run;

34. How do you generate descriptive statistics in SAS?

Descriptive statistics can be generated using PROC MEANS, PROC SUMMARY, or PROC UNIVARIATE.

proc univariate data=dataset_name;
    var numeric_variable;
run;

35. What is the use of CALL SYMPUTX?

CALL SYMPUTX is used to assign a value to a macro variable, removing leading and trailing blanks.

data _null_;
    call symputx('varname', value);
run;

36. How do you create a histogram in SAS?

A histogram can be created using PROC SGPLOT or PROC UNIVARIATE.

proc sgplot data=dataset_name;
    histogram numeric_variable;
run;

37. How do you filter data in SAS?

Data can be filtered using the WHERE or IF statements.

data filtered_data;
    set dataset_name;
    where age > 18;
run;

38. How do you use the FORMAT statement in SAS?

The FORMAT statement is used to apply formats to variables in a dataset.

data formatted_data;
    set dataset_name;
    format date_variable date9.;
run;

39. What is PROC CONTENTS used for?

PROC CONTENTS provides information about the contents of a dataset, such as variable names, types, and formats.

proc contents data=dataset_name;
run;

40. How do you use the IF-THEN/ELSE statement in SAS?

The IF-THEN/ELSE statement is used for conditional processing in a DATA step.

data conditional_data;
    set dataset_name;
    if age > 18 then adult = 1;
    else adult = 0;
run;

41. How do you rename variables in SAS?

Variables can be renamed using the RENAME statement in a DATA step.

data renamed_data;
    set dataset_name(rename=(old_name=new_name));
run;

42. How do you calculate the median in SAS?

The median can be calculated using PROC MEANS or PROC UNIVARIATE.

proc means data=dataset_name median;
    var numeric_variable;
run;

43. How do you create a macro variable in SAS?

A macro variable can be created using the %LET statement.

%let varname = value;

44. How do you generate a PDF report in SAS?

A PDF report can be generated using the ODS PDF statement.

ods pdf file='report.pdf';
proc print data=dataset_name;
run;
ods pdf close;

45. How do you read data from an Excel file in SAS?

Data from an Excel file can be read using PROC IMPORT.

proc import datafile='file.xlsx' 
    out=dataset_name 
    dbms=xlsx 
    replace;
    getnames=yes;
run;

46. How do you join tables in SAS?

Tables can be joined using the MERGE statement in a DATA step or using SQL joins in PROC SQL.

proc sql;
    select a.*, b.*
    from table1 as a
    left join table2 as b
    on a.id = b.id;
quit;

47. How do you handle outliers in SAS?

Outliers can be handled by identifying them using PROC UNIVARIATE or PROC MEANS and then applying appropriate techniques such as capping, removing, or transforming them.

48. How do you calculate the difference between dates in SAS?

The difference between dates can be calculated using the INTCK function.

data date_diff;
    set dataset_name;
    diff = intck('day', start_date, end_date);
run;

49. How do you create a frequency table in SAS?

A frequency table can be created using PROC FREQ.

proc freq data=dataset_name;
    tables categorical_variable;
run;

50. How do you save a permanent dataset in SAS?

A permanent dataset is saved by specifying a library other than WORK, typically assigned with a LIBNAME statement.

libname mylib 'C:\SASDatasets';
data mylib.dataset_name;
    set work.dataset_name;
run;

51. How do you use the INFILE statement to read raw data files?

The INFILE statement is used in the DATA step to specify the location of an external raw data file.

data raw_data;
    infile 'file-path' dlm=',' missover;
    input variable1 $ variable2 $ variable3;
run;

52. How do you create a macro with positional parameters?

A macro with positional parameters allows you to pass values into the macro without specifying the parameter names.

%macro example_macro(param1, param2);
    %put ¶m1 ¶m2;
%mend;
%example_macro(value1, value2);

53. How do you use CALL MISSING in SAS?

CALL MISSING is used to assign missing values to a list of variables.

data missing_data;
    set input_data;
    call missing(var1, var2, var3);
run;

54. How do you concatenate character strings in SAS?

Character strings can be concatenated using the CATS, CATT, CATX, or CAT functions.

data concatenated_data;
    set input_data;
    full_name = cats(first_name, ' ', last_name);
run;

55. How do you create and use a custom informat in SAS?

A custom informat is created using the PROC FORMAT statement and can be used to read in specific data formats.

proc format;
    invalue $genderfmt
        'M' = 'Male'
        'F' = 'Female';
run;

data formatted_data;
    infile 'file-path';
    input gender $genderfmt.;
run;

56. How do you transpose data using PROC TRANSPOSE?

PROC TRANSPOSE is used to transpose data from wide to long format or vice versa.

proc transpose data=input_data out=transposed_data;
    by id_variable;
    var variable1 variable2 variable3;
run;

57. How do you use the RETAIN statement in SAS?

The RETAIN statement is used to carry over the value of a variable from one iteration of the DATA step to the next.

data retained_data;
    set input_data;
    retain count 0;
    count + 1;
run;

58. How do you generate descriptive statistics for categorical variables?

Descriptive statistics for categorical variables can be generated using PROC FREQ.

proc freq data=input_data;
    tables categorical_variable;
run;

59. How do you use IF-THEN/ELSE logic for conditional processing?

IF-THEN/ELSE logic is used to perform conditional operations in a DATA step.

data conditional_data;
    set input_data;
    if age >= 18 then adult = 'Yes';
    else adult = 'No';
run;

60. How do you generate Kaplan-Meier survival estimates?

Kaplan-Meier survival estimates can be generated using PROC LIFETEST.

proc lifetest data=survival_data;
    time survival_time*censor(0);
run;

61. How do you perform a linear regression analysis in SAS?

Linear regression analysis can be performed using PROC REG.

proc reg data=input_data;
    model y_variable = x_variable1 x_variable2;
run;

62. How do you use the INPUT statement to read data?

The INPUT statement is used in the DATA step to specify the variables to be read from an external file.

data input_data;
    infile 'file-path';
    input var1 var2 var3;
run;

63. How do you export data to an Excel file in SAS?

Data can be exported to an Excel file using PROC EXPORT.

proc export data=input_data
    outfile='output-file.xlsx'
    dbms=xlsx replace;
run;

64. How do you handle character variables with leading or trailing spaces?

Character variables with leading or trailing spaces can be handled using the STRIP or TRIM functions.

data cleaned_data;
    set input_data;
    cleaned_var = strip(original_var);
run;

65. How do you check for duplicate records in a dataset?

Duplicate records can be checked using PROC SORT with the NODUPKEY option.

proc sort data=input_data nodupkey;
    by key_variable;
run;

66. How do you use the RENAME statement to rename variables?

The RENAME statement is used in a DATA step to change the names of variables.

data renamed_data;
    set input_data(rename=(old_var=new_var));
run;

67. How do you calculate cumulative sums in SAS?

Cumulative sums can be calculated using the RETAIN statement and a SUM function.

data cumulative_sum;
    set input_data;
    retain cumulative 0;
    cumulative + value;
run;

68. How do you use the MERGE statement to combine datasets?

The MERGE statement is used in a DATA step to combine two or more datasets by key variables.

data merged_data;
    merge dataset1 dataset2;
    by key_variable;
run;

69. How do you use the SUM function in SAS?

The SUM function is used to calculate the sum of non-missing values in a list of variables.

data summed_data;
    set input_data;
    total = sum(var1, var2, var3);
run;

70. How do you create a custom report using PROC REPORT?

A custom report can be created using PROC REPORT.

proc report data=input_data;
    columns var1 var2 var3;
    define var1 / group;
    define var2 / sum;
    define var3 / mean;
run;

Thursday, October 30, 2008

How to determine whether a numeric or character value exists within a group of variables

Using the IN operator to determine whether a numeric or character value exists within a group of variables

When trying to determine whether a specific value exists within a group of variables, a common approach is to associate the variables with an ARRAY and then use a DO loop to loop through every element or variable in the ARRAY. As an example,
here is a segment of code:


array my_array[*] var1 - var10;
do i = 1 to dim(my_array);
if some_value = my_array[i] then found = 'Yes';
end;


A more efficient alternative is to use the IN operator with the name of the ARRAY and avoid using the DO loop. This can be done with both numeric ARRAYS as well as character ARRAYS. Here is a code segment:

array my_array[*] var1 - var10;
if some_value IN my_array then found = 'Yes';


source: http://support.sas.com/kb/33/227.html

How to convert a SAS date to a character variable

/***************************************************************************//*
Title: Convert a SAS date to a character variable *// *
*//* Goal: Use the PUT function to create a character variable from *//*
a SAS date. *//* *//***************************************************************************/

data one;
input sasdate :mmddyy6.;
datalines;
010199;
run;

data two;
set one;
chardate=put(sasdate,mmddyy6.);
run;

/* RESULTS */

Obs sasdate chardate
1 14245 010199


Source: ftp://ftp.sas.com/techsup/download/sample/datastep/convertchar.html

How to convert a character variable that represents a date into a SAS date

Convert a character variable that represents a date into a SAS date

Use the INPUT function to convert a character value that represents a date into a SAS date value.

Data one;
input chardate1 :$6. chardate2 :$9. chardate3 $10. chardate4 :$9.;
datalines;
010199
31dec1999
21/09/2005
5/9/2005; Run;

/* Use the INPUT function to convert a character value that represents a date
*//* into a SAS date value. Choose the second parameter to the INPUT function
*//* based upon what the current character value looks like. Use a FORMAT
*//* statement to apply the date format you want when you are done. *//*


*//* Note: If you are in SAS 9.0 or above, you may prefer using the ANYDTDTEw.
*//* Informat as the second argument to the INPUT function. ANYDTDTEw.
*//* can read multiple date layouts. Refer to the SAS Language Reference,
*//* Dictionary under INFORMATS for more information. */



data two;
set one;
sasdate1=input(chardate1,mmddyy6.);
sasdate2=input(chardate2,date9.);
sasdate3=input(chardate3,ddmmyy10.);
sasdate4=input(chardate4,ddmmyy10.);
format sasdate1 mmddyy10. sasdate2 yymmdd10. sasdate3 date9. sasdate4 monyy7. ;
run;

proc print;
run;


RESULTS:
Obs chardate1 chardate2 chardate3 chardate4 sasdate1 sasdate2 sasdate3 sasdate4 1
01 0199 31dec1999 21/09/2005 5/9/2005 01/01/1999 1999-12-31 21SEP2005 SEP2005
source: http://support.sas.com/kb/24/591.html

Wednesday, October 29, 2008

LAG Function: How to obtain information from previous observation(s)

Often times SAS® programmers need to retain the value of a variable in the current observation to the next observation. The LAG function  can be very helpful here. A LAGn (n=1-100) function returns the value of the nth previous execution of the function. It is easy to assume that the LAGn functions return values of the nth previous observation.


Using the LAG function to obtain information from previous observation(s)

**********************************************************;/* Sample 1: Create a single lag of one variable */


data one;
input x;
lagonce=lag(x);
datalines;
1
2
3
4
5
;
proc print data=one;
title 'Sample1: Single lag of one variable';
run;

***************************************************************;/* Sample 2: Create multiple lags of one variable */


data two;
input x;
lag1=lag(x);
lag2=lag2(x);
datalines;
1
2
3
4
5
;
proc print data=two;
title 'Sample 2: Multiple lags of one variable';
run;
***************************************************************;/* Sample 3: Create a single lag of one variable within a BY-Group */
/* See also: */
/* Sample 140: Obtaining the previous value of a variable within */a BY-Group */
/* Sample 108: Use the LAG function to conditionally carry */
/* information down a data set */



data three;
input group $ x;
datalines;
a 1
a 2
a 3
b 1
b 2
b 3
b 4
;
data final;
set three;
by group;
lagx=lag(x);
/* Note the LAG function is executed outside the IF condition. */
/* On the first member of the BY-Group, the variable created */
/* with the LAG function is reset to missing. */

if first.group then lagx=.;
run;
proc print data=final;
title 'Sample 3: Single lag of one variable within a BY-Group';
run;

RESULTS:
Sample1: Single lag of one variable
Obs x lagonce
1 1 .
2 2 1
3 3 2
4 4 3
5 5 4

Sample 2: Multiple lags of one variable
Obs x lag1 lag2
1 1 . .
2 2 1 .
3 3 2 1
4 4 3 2
5 5 4 3

Sample 3: Single lag of one variable within a BY-Group
Obs group x lagx
1 a 1 .
2 a 2 1
3 a 3 2
4 b 1 .
5 b 2 1
6 b 3 2
7 b 4 3

source: http://support.sas.com/kb/25/938.html


Without Using LAG Function:
*****************************************************************************;


Example2:
data lagcheck;

input a b ;
datalines;
1 1
. 2
. 3
. 4
. 5
2 6
. 7
. 8
3 9
. 10
. 11
. 12
. 13
. 14
;
run;
*Method1;
data lagcheck;
set lagcheck;
n=_n_;
if missing(a) then do;
do until (not missing(a));
n=n-1;
set lagcheck(keep=a) point=n;
end;
end;
run* Note: Remember 2 Set statements;
**********************************************************;
*Method2;
data lagcheck;
set lagcheck;
retain lasta;
if not(missing(a)) then lasta=a;
if missing(a) then a=lasta;
drop lasta;
run;

***************************************************************;
* Here is another example given in SAS-L archives about Re: A Confusion about how to filling out empty cells with duplicates. and interesting solutiion using UPDATE Statement;

data have;
input Subject number1 number2;
infile datalines truncover;
datalines;
10001 212
10001 . 10
10002 555
10002
10002
10002 . 11
10003 11
10003
10003 . 12
10003

;;;;
run;


data need;
do _n_ = 1 by 1 until(last.subject);
update have(obs=0) have;
by subject;
end;
do _n_ = 1 to _n_;
output ;
end ;
run;
**********************************************************************;

Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...