Friday, February 27, 2009

How to create a comma separated file (.csv) of a SAS dataset?

IN SAS programming, we often require outputting the dataset in different formats like EXCEL and CSV etc and here are the five different ways to export the SAS dataset into .csv file.

Example:

data new ;
infile datalines dsd dlm=' ' missover;
input a b c d;

datalines;
3 5 1 1
4 1 . .
5 8 3 2
6 0 4 4

;
run;


By putting MISSOVER in the infile statement we are telling SAS to do not look for the data in the next lane if it runs out of the data, instead keep missing values for any remaining variables.

DSD and DLM options should be included always in the infile statement, if we include the
dlm=’ ‘ in the infile statement then SAS will put one digit for each variable even though we haven’t assigned any length to variable.

DSD option will tell SAS to consider a missing value if 2 delimiters are present side by side in any observation.

When we ran the above program in SAS, we create a SAS dataset name ‘NEW’ in the work directory and if we want to create a .csv file of dataset ‘NEW’ here are the 5 different ways to do it:

1)Here is the simplest method and least known method to create the .CSV file of a dataset; Using the DEXPORT statement.

Here is the syntax:

dm "dexport new 'H:\abccompany\client\Programs\Analysis\project1\class.csv' ";

When we submit the above code, SAS will automatically create a .CSV file in specified location path.


2) .CSV file can also be created using the PROC EXPORT procedure:
Here is the syntax:

proc export data=new
outfile=" H:\abccompany\client\Programs\Analysis\project1\class.csv ";
run;


3) By using the ODS and the Proc print we will be able to create a .CSV file of SAS dataset.

Here is the way to do it;

ods csv file= ‘H:\abccompany\client\Programs\Analysis\project1\class.csv ';

proc print data = new noobs;
run;
ods trace on;
ods csv close;

By keeping ODS trace on; statement we are telling SAS no to print the results in the output window, because, we are only creating the .csv file.

NOOBS option is required here, because PROC PRINT by default will create a new variable called 'OBS' and since we don't require it, we should include the NOOBS option.

4) File statement can also be useful in creating the .csv file of a SAS dataset.
Here is the way to do it.

data _null_;
file " H:\abccompany\client\Programs\Analysis\project1\class.csv ";

set new;
put (_all_) (',');
run;


5)b) Another way using the file statement:

filename csvH:\abccompany\client\Programs\Analysis\project1\class.csv';

data _null_;
set new ;
file csv dlm=',';
put ( _all_ ) (+0);
run;

If we don’t mention the PUT( _all_), SAS will not keep all the variables in the .CSV file and even if we include put (_all_) the log will say:

ERROR 79-322: Expecting a (.

ERROR 76-322: Syntax error, statement will be ignored.

Because SAS is expecting a second parentheses here to follow the first one immediately and if we just only put the closed parenthesis to get rid of the previous error,
put (_all_) ( ) ;

We will get another error, because SAS we haven’t set any specifications for the PUT statement in the 2nd parenthesis.

So we can use a do-nothing pointer control (+0) to be error free.

Import CSV files and create SAS data sets on the fly!

How to Import Excel files into SAS

Reading from Excel Spreadsheets:
Microsoft Excel spreadsheets can be read from SAS in several ways. Two of these will be demonstrated here. First, PROC IMPORT allows direct access to Excel files through SAS/Access to PC File Formats or access to Comma-Separated (CSV) files through Base SAS. The second method uses the Excel LIBNAME engine.

PROC IMPORT
The IMPORT procedure reads from external sources and creates a SAS data set. Two sources are Excel spreadsheets and CSV files. A particular SAS/Access product may be required for certain sources, however. In our example, SAS/Access to PC File Formats is required to read an Excel file, but a CSV file can be accessed with Base SAS.

General Syntax for PROC IMPORT:

PROC IMPORT DATAFILE="c:\sas\ego.csv" OUT=jeeshim.egov DBMS=CSV REPLACE;For Excel you use the DATAFILE=”filename” option to specify the Excel file to be read. (The TABLE=”tablename” option would be applicable if you were reading from a database such as Microsoft Access.)

GETNAMES=YES;
DATAROW=2;
RUN;


The OUT= option specifies the SAS data set to be created. The DBMS= option identifies the type of file to be read. In this case, you will use either EXCEL or CSV to read an Excel spreadsheet or CSV file, respectively. Finally, the REPLACE option determines whether to replace the data set that is created, if it already exists.


read more at Reading from Excel Spreadsheets


Learning SAS

Monday, February 23, 2009

THE SPECIAL “??” FORMAT MODIFIER

The following excerpt is from SAS OnlineDoc documentation:

? or ??

Direct link: http://www.nesug.org/Proceedings/nesug01/at/at1013.pdf

The optional question mark (?) and double question mark (??) format modifiers suppress the printing of both the error messages and the input lines when invalid data values are read. The ? modifier suppresses the invalid data message. The ?? modifier also suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.

Below is an example of using ?? to determine whether a variable contains non-numeric values or not:

data _null_;
x = 12345678”;
if (input(x, ?? 8.) eq .) then
put non-numeric’;
else put ‘numeric’;
run;


Running SAS would return “Numeric” in the above example. If we used X=”123a5678”, SAS would return “Non-Numeric”. Note that the input format in the above example is “8.” So only the first 8 bytes of the character string are checked.

Thus, X=123456789a would return “Numeric” as it would only be checking the first 8 bytes of the string.
 
 
I found an  interesting tip about suppressing the invalid data note and/or error message when reading in formatted data while surfing on the net.... (Source : Queensland Users Exploring SAS Technology)
 
Tips & Techniques - #3


Q: Is there a way to suppress the invalid data note and/or error message when reading in formatted data?

A:

􀂄Use the Format Modifiers ? or ?? on the input statement (or input function)

τ€‚„? –suppresses printing the invalid data note when SAS encounters invalid data values.

τ€‚„??––suppresses printing the messages and the input lines when SAS encounters invalid data values. The automatic variable _ERROR_ is not set to 1 for the invalid observation.

􀂄For example:

input x ?? 10 10-12;


Is the same as:


input x ? 10 10-12;


_error_=0;


PROC SQL basics, tips and techniques and sample code programs

Proc SQL:


Power of SAS SQL:

• SQL looks at datasets differently from SAS
– SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to the last
– SQL looks at all the records, as a single object
• Because of this difference SQL can easily do few things that are more difficult to do in SAS
• There are a number of built in functions in SQL that can be used in a select statement
• Because of how SQL handles a dataset, these functions work over the entire dataset
• Functions:

– Count: Counts Values
– Sum: Sums Values
– Max: Identifies the largest value
– Min: Identifies the smallest value
– Mean: Averages the values

Read more at www.cognigencorp.com/perspective/tipsNtricks.pub/1/PROC%20SQL%20Talk_12_.ppt

SAS SQL

Introduction to Proc SQL

AN INTRODUCTION TO PROC SQL®

PROC SQL: When and How to Use It?

Proc SQL – A Primer for SAS Programmers

Understanding PROC SQL

Creating Macro Variables with Proc SQL

DATA Step vs. PROC SQL: What’s a neophyte to do?


Effective Use of SQL in SAS Programming

Variable List Short-Cuts in PROC SQL

Ten Best PROC SQL Tips and Techniques

Undocumented and Hard-to-Find PROC SQL C2 AE_Features

Variable List Short-Cuts in PROC SQL

Exploring the World of PROC SQL® Joins

Using PROC SQL to Create Ad Hoc Reports

CREATING MACRO VARIABLES VIA PROC SQL

PROC SQL for DATA Step Die-hards

Calculating Changes and Differences Using PROC SQL —With Clinical Data Examples

Validating Data Via PROC SQL

Wednesday, February 11, 2009

Length of Numeric variables GT 8 in SAS| StudySAS BLOG

Q&A: numeric variables length more than 8? We all know that the default length of the numeric variables in SAS is 8 and if suppose I want to store a number lets say (12345678910, which has a length 11 to numeric variable) to variable total, what should I do?


What if the numeric variable digits are more than 12 digits and i want to store them all without any E values?

ANS)

The default length of numeric variables in SAS is 8 and all the numbers that we see in the sas datasets are called as floating numbers(floating point binary) and not a regular sequence numbers form 1 to 10. When we are using SAS/Windows as our operating system and then the minimum length for any numeric variable should be 3(not 1 as we get confused all the time). So if a variable contas less than 3 digits means it is stored with less space.

The reason is, since a numeric variable will need a power and and the sign(+ or -), if SAS want to store a numberit defenitely needs a minimum of 3 bytes.

Depending upon the operating system we are using for SAS, the range for numeric variables can be 2 to 8 or 3 to 8.

Since the type of floating-point values is upto 16 decimal digits. we can store numbers upto 16(1234567891012234) total 16 digits for a numeric variable, but that can be possible if we use a format statement.

Ex:
data dsn;

a=1234567891012234;
format a best16.;run;


 
If we open the output, we can see all the 16 digits were stored for the variable a exactly in the dataset dsn.

If the the value of numeric variable is upto 12 disgits we don't require to specify any formats, if it is more than 12 digits we have to mention specify the format statement. without it we will see error in the system log.

Monday, February 9, 2009

Options VALIDVARNAME=UPCASE

VALIDVARNAME= V7 UPCASE ANY

VALIDVARNAME= option is generally used in SAS whenever we want to control the SAS variable names in the dataset.

VALIDVARNAME= V7 UPCASE ANY

The default option will be VALIDVARNAME=V7 until we specify as UPPERCASE or ANY.

When we mention options VALIDVARNAME=V7, that means we are telling SAS to change the name of the Database column (etc EXCEL sheet column) to valid SAS name with certain rules keeping in mind.

Here are those rules that SAS needs to follow, when it changes the DBMS column name to valid SAS name.

Only 32 mixed case (lower or uppercase) characters are allowed in each variable.

Names should be starting with an underscore or an alphabet (either uppercase or lower case character).

Invalid characters in the DBMS column (ex. $) should be changed to underscores.


See the SAS Language Reference: Dictionary to get more details about the rules.

VALIDVARNAME=UPCASE
When we mention options VALIDVARNAME=UPPERCASE we are telling SAS to change the column name of the Database column to uppercase variables irrespective of type of variables in the DBMS column.


And whenever we want the same kind of characters in SAS dataset which are in the DBMS column (ex .(=) sign and the Asterisk(*) or the forward slash(\) we have to mention options

VALIDVARNAME=ANY
If we do, this will allows any characters which are in the DBMS column to be kept in the SAS dataset.

To understand the concept better here I am giving the example:

Example
The following example shows how the Pass-Through Facility works with
VALIDVARNAME=UPPERCASE.

options validvarname=uppercase;
proc sql;
connect to oracle as tables(user=USERID orapw=passward path=’INSTANCE’);
create table lab as
select lab_rslt, lab_test
from connection to oracle
(select "laboratory result$", "laboratory test$"
from DBMStable);
quit;

When we check the Output we observe that the variables in the DBMS column is changed to upper case as well as V7 (default option) converts those variables into UPPERCASE variables. Ex: " laboratory result$" becomes LAB_RSLT and " laboratory test$" becomes LAB_TEST.


Friday, February 6, 2009

How to merge data sets with a common variable?

Here is the simple way of merging the data sets with a common variable if the datasets has the same prefix name.

For example: col1-col10, dsn1-dsn 7 , or data1 to data10 with common variable of ID.

Considering we have 10 datsets and all of them having the same prefix data;

%macro mymerge (n);

data merged;
merge %do i = 1 % to &n;
data&i %end; ; /* this additional ';' is necessary, the first ';' is for the "%end", while the second ';' is for "Merge"*/;
by id;
run;
%mend;
%mymerge(10)

Merging the data sets using macro code

Merging the data sets with a common variable if the datasets has the same prefix name?

For example: col1-col10 dsn1-dsn 7 data1 to data6 with common variable of Usubjid.

here is the example, I have 7 datasets i need to merge and each of them having the common variable(usubjid) to merge, and all the datasets having the same prefix dsn(dsn1 to dsn7).

%macro allmerge (n);
data combine;
merge

%do i = 1 % to &n;
dsn&i
%end;

; /* this additional ';' is necessary, the first ';' is for
the "%end", while the second ';' is for "Merge" */

by usubjid;
run;
%mend;
%allmerge (7)



Resolving and Using &&var&i Macro Variables

Here are the 2 important questions always comes up in our minds,(& vs && vs &&& and single dot and double dots) when we are dealing with macros for the first time and here are the answers for them.

I did find a very good regarding the above topics in the one of the SAS forums and IAN WHITLOCK explained it very clear.


or
when to use &,&&,and &&&,how do we distinguish?
&MACVAR references a macro variable. The rule is that the scanner reads from left to right. Two ampersands resolve to one and scanner continues. If the resulting expression has ampersands then that expression is scanned again. So &&x resolves to &x resolves to value of X
&&&x resolves to & followed by the value of X which then resolves to the value of the variable indicated.

If you are familiar with TSO CLISTS (even if you are not), they are very similar to SAS macro. SAS was originally based on PL1, so both SAS syntax and SAS macro are similar in some ways to
PL1 and PL1 macro.

what is the diff between Single dot and double dot(eg. &chech.> and &check..)

&CHECK and &CHECK. are the same thing. If the scanner finds a . that ends the macro variable reference, then the scanner eats the. and ends the reference. If there is no current macro variable then a . is a . So if &X is MYFMT then &X.. is MYFMT.

Test example: what is &&&X..5 when X has the value V and V has the value TEST?

&&&X..5 -> &V.5 -> TEST5

Test: Consider &&&...&X where the dots indicate there are a total of 15 ampersands preceding the X. Write a sequence of %LET statements followed by

%put &&&...&X ;

so that this causes text to be written on the log without any notes, warnings. or errors. One, three, and 15 are interesting sequences of amprsands. Find one number in between 3 and 15 and next oone after 15 that are interesting for the same reason. How many dots would be needed to make the letter S immeadiately follow the value of &&&...&X?

You may also need to check the SIGI paper to understand the multiple ampersands concept in detail:

Example: ....

%let dsn=study;
%let n=05;
%let dsn05=Client;
%put &dsn&n; *Resolves to study05;
%put &dsn.&n; *Resolves to study05;
%put &dsn..&n; *Resolves to study.05;

*Multi Ampersands Concept; %put &&dsn&n;
* First Scan Resolves to &dsn05;
* Second Scan Resolves to Client;

%put &&&dsn&n;
* First Scan Resolves to &client05;
* Second Scan Resolves to &Client05; *with an error message in the log file....;


*Log file; 57 %put &&&dsn&n;

WARNING: Apparent symbolic reference STUDY05 not resolved.
&study05

/*Because.. the most common mistake is that.. */
/*We assume that macro variable resolution process proceeds from right to left...*/
/*Infact it isn't. */

The bottom line is ...
/**For some reason.... the &&&dsn&n is taken as && &dsn&n, which resolves to &client05.*/
/*This macro variable will not get resolved, because &client5. macro variable isn't there in the symbol table.; */
Macrovariables and its resolution:
Example:
%let one=two;
%let two=three;
%let three=Check;

%put &one;
%put &&one;
%put &&&one;
%put &&&&one;
%put &&&&&one;
%put &&&&&&one;
%put &&&&&&&one;
%put &&&&&&&&one;
%put &&&&&&&&&one;
%put &&&&&&&&&&one;

Answer:


%put &one; two
%put &&one; two
%put &&&one; three
%put &&&&one; two
%put &&&&&one; three
%put &&&&&&one; three
%put &&&&&&&one; Check
%put &&&&&&&&one; two

%put &&&&&&&&&one; three
%put &&&&&&&&&&one; three


Resolving and Using &&var&i Macro Variables

Thursday, February 5, 2009

How can I count number of observations per subject in a data set?

We always have this question in mind, while we do the SAS programming and here is the simple answer for that, we just need to use SUM statement and the FIRST.variable in the SET statement and then the RETAIN statement to calculate the observations count per subject.

By doing some minor modification we can calculate observations count per subject per visit also. (Just include visit variable in the BY variable list in PROC sort and First. variable list in datastep with SET statement).


For example:

data dsn;
input patid implants;
datalines;
1 3
1 1
1 2
1 1
2 1
2 2
3 1
4 2
3 1
4 5
2 3
1 6
;
run;

proc sort data=dsn;
by patid;
run;

data dsn1;
set dsn;
by patid;
cnt+1;
if first.patid then cnt=1;
run;

proc sort data=dsn1;
by patid descending cnt;
run;

data dsn2;
set dsn1;
by patid;
retain totcnt;
if first.patid then totcnt=cnt;
output;
run;

proc print data=dsn2;
run;


Output:


Tuesday, February 3, 2009

Mastering Duplicates Removal in SAS: A Comprehensive Guide to Using PROC SQL, DATA STEP, and PROC SORT

Removing Duplicate Observations in SAS: A Comprehensive Guide

Removing Duplicate Observations in SAS: A Comprehensive Guide

In data analysis, it's common to encounter datasets with duplicate records that need to be cleaned up. SAS offers several methods to remove these duplicates, each with its strengths and suitable scenarios. This article explores three primary methods for removing duplicate observations: using PROC SQL, the DATA STEP, and PROC SORT. We will provide detailed examples and discuss when to use each method.

Understanding Duplicate Observations

Before diving into the methods, let's clarify what we mean by duplicate observations. Duplicates can occur in different forms:

  • Exact Duplicates: All variables across two or more observations have identical values.
  • Key-Based Duplicates: Observations are considered duplicates based on the values of specific key variables (e.g., ID, Date).

The method you choose to remove duplicates depends on whether you are dealing with exact duplicates or key-based duplicates.

Approach 1: Removing Duplicates with PROC SQL

PROC SQL is a versatile tool in SAS, allowing you to execute SQL queries to manipulate and analyze data. When removing duplicates, you can use the SELECT DISTINCT statement or apply more complex conditions.

Example 1: Removing Exact Duplicates

proc sql;
    create table no_duplicates as
    select distinct *
    from original_data;
quit;

This code removes all exact duplicates, creating a new dataset no_duplicates that contains only unique records. The SELECT DISTINCT * statement ensures that every unique combination of variable values is retained only once.

Example 2: Removing Duplicates Based on Key Variables

proc sql;
    create table no_duplicates as
    select distinct ID, Name, Age
    from original_data;
quit;

Here, duplicates are removed based on the combination of the ID, Name, and Age variables. This is useful when you want to keep unique records for specific key variables, ignoring other variables in the dataset.

Advantages of PROC SQL:

  • Flexibility: PROC SQL can handle complex queries, allowing you to remove duplicates based on multiple or complex criteria.
  • Powerful Filtering: SQL allows you to apply conditions and filters easily, making it easier to control the exact duplicates you want to remove.

Disadvantages of PROC SQL:

  • Performance: The SELECT DISTINCT statement can be slower with very large datasets, as it requires scanning the entire dataset to identify unique records.
  • Complexity: SQL syntax may be less intuitive for those who are more comfortable with traditional SAS programming.

Approach 2: Removing Duplicates with the DATA STEP

The DATA STEP in SAS provides a programmatic approach to removing duplicates, giving you fine-grained control over the process. This method typically involves sorting the dataset first and then using conditional logic to remove duplicates.

Example 1: Removing Exact Duplicates

To remove exact duplicates, you must first sort the data by all variables and then use the DATA STEP to retain only the first occurrence of each observation.

proc sort data=original_data noduprecs out=sorted_data;
    by _all_;
run;

data no_duplicates;
    set sorted_data;
run;

The noduprecs option in PROC SORT removes exact duplicate records. The sorted and deduplicated dataset is then saved as no_duplicates.

Example 2: Removing Duplicates Based on Key Variables

If you want to remove duplicates based on specific key variables, you can sort the data by those variables and use the first. or last. functions in the DATA STEP to control which duplicates are kept.

proc sort data=original_data;
    by ID;
run;

data no_duplicates;
    set original_data;
    by ID;
    if first.ID;
run;

In this example, the dataset is first sorted by the ID variable. The first.ID statement ensures that only the first occurrence of each ID is kept, removing any subsequent duplicates.

Advantages of the DATA STEP:

  • Fine-Grained Control: The DATA STEP allows you to apply custom logic to the deduplication process, such as retaining the first or last occurrence based on additional criteria.
  • Efficiency: When dealing with large datasets, this method can be more efficient, especially if you need to apply complex logic.

Disadvantages of the DATA STEP:

  • Manual Sorting: You need to sort the data before removing duplicates, adding an extra step to the process.
  • Complexity: The logic required to remove duplicates can be more complex and less intuitive than using PROC SORT.

Approach 3: Removing Duplicates with PROC SORT

PROC SORT is one of the simplest and most commonly used methods for removing duplicates in SAS. This approach sorts the data and can automatically remove duplicates during the sorting process.

Example 1: Removing Exact Duplicates

proc sort data=original_data noduprecs out=no_duplicates;
    by _all_;
run;

Here, PROC SORT with the noduprecs option removes exact duplicates. The by _all_ statement ensures that the sort is applied to all variables, making the deduplication based on the entire record.

Example 2: Removing Duplicates Based on Key Variables

proc sort data=original_data nodupkey out=no_duplicates;
    by ID;
run;

In this case, PROC SORT uses the nodupkey option to remove duplicates based on the ID variable. The out= option specifies that the sorted and deduplicated data should be saved to the no_duplicates dataset.

Advantages of PROC SORT:

  • Simplicity: PROC SORT is straightforward and easy to use, requiring minimal code to remove duplicates.
  • Efficiency: PROC SORT is optimized for sorting and deduplication, making it very fast, especially for large datasets.

Disadvantages of PROC SORT:

  • Limited Flexibility: PROC SORT can only remove duplicates based on sorted keys, which might not be suitable for more complex deduplication needs.
  • No Complex Logic: Unlike the DATA STEP, PROC SORT does not allow you to apply custom logic or conditions during the deduplication process.

Comparison Summary

Each method for removing duplicates in SAS has its strengths and weaknesses:

  • Use PROC SQL when you need flexibility and the ability to apply complex conditions for deduplication, especially when working within a SQL-based framework.
  • Use the DATA STEP if you require precise control over the deduplication process and need to apply custom logic to determine which duplicates to keep.
  • Use PROC SORT for its simplicity and efficiency when dealing with large datasets, particularly when you only need to remove duplicates based on simple keys.

Conclusion

Removing duplicates is a crucial step in data cleaning and preparation, and SAS provides multiple tools to accomplish this task. By understanding the differences between PROC SQL, the DATA STEP, and PROC SORT, you can choose the most appropriate method for your specific data processing needs. Whether you need flexibility, control, or efficiency, SAS offers the right approach to ensure your data is clean and ready for analysis.

Monday, February 2, 2009

How to scan more than 20 records to determine variable attributes

Usage Note 1075: How to scan more than 20 records to determine variable attributes in EFI

In Versions 7 and 8 of the SAS System, by default the Import Wizard, PROC IMPORT and the External File Interface (EFI) scan 20 records to determine variable attributes when reading delimited text files.

Changing the default setting can only be done for EFI in Version 7, Release 8 and Release 8.1. Beginning in Release 8.2 changing the default setting is applicable to the Import Wizard, PROC IMPORT and EFI.

Follow the steps below to change the default behavior:

1) Type regedit on the command line (white box with a check mark)

2) When the Registry Editor window opens, double click on the PRODUCTS icon

3) Double click on the BASE icon

4) Double click on the EFI icon

5) In the window on the right the Contents of EFI will be populated with EFI options

6) Double click on GuessingRows

7) When the new window opens with the old value of 20, delete it, enter the new value, and click on OK.

8) Close the Registry Editor window

9) Invoke the Import Wizard, PROC IMPORT or EFI to use the new GuessingRows value

The new value entered for GuessingRows will remain until you change it.

Beginning in SAS 9.1 there is a new statement, GUESSINGROWS=, that can be specified with PROC IMPORT. By specifying the GUESSINGROWS= statement with PROC IMPORT, you do not have to change the GuessingRows value in the SAS Registry. Also beginning in SAS 9.1 you can specify the Number of Rows to Guess when using the Import Wizard in the SAS

Import: Delimited File Options Window or when using EFI in the Options for Import Window.

source:http://support.sas.com/kb/1/075.html


Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...