StudySAS Blog: Mastering Clinical Data Management with SAS

Welcome to StudySAS, your ultimate guide to clinical data management using SAS. We cover essential topics like SDTM, CDISC standards, and Define.XML, alongside advanced PROC SQL and SAS Macros techniques. Whether you're enhancing your programming efficiency or ensuring compliance with industry standards, StudySAS offers practical tips and insights to elevate your clinical research expertise. Join us and stay ahead in the evolving world of clinical data.

How to Determine if a variable exists in the dataset or not

Programatically determine if a variable exists in a data set

Use the OPEN and VARNUM functions to determine if a variable exists in a SAS data set. The OPEN function opens a SAS data set and returns a data set identifier. The VARNUM function returns the variable's position in the data set. If the variable does not exist then VARNUM returns a 0.

/* Create sample data */
data test;
input fruit $ count;
datalines;
apple 12
banana 4
coconut 5
date 7
eggs 9
fig 5
;
/* Specify the data set to open in the OPEN function. Specify the */
/* variable to be located in the second parameter of the VARNUM */
/* function. If the variable does not exist, the value of check */
/* will be zero. */

data _null_;
dsid=open('test');
check=varnum(dsid,'count');
if check=0 then put 'Variable does not exist';
else put 'Variable is located in column ' check +(-1) '.';
run;

OUTPUT:

/* Partial LOG output */
311 data _null_;
312 dsid=open('test');
313 check=varnum(dsid,'count');
314 if check=0 then put 'Variable does not exist';
315 else put 'Variable is located in column ' check +(-1) '.';
316 run;
Variable is located in column 2.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

source:www.support.sas.com

What to do when we want only the even number observations

Output only even number observations.

Note: The MOD function returns the remainder when values are divided. In this sample, when dividing i by 2, there will be no remainder for the even observations (2,4,6,8 and 10). If there is a remainder, the current observation has to be an odd numbered observation. This program outputs only the even numbered observations to a new data set.

/* Create sample data set */
data one;
do i=1 to 10;
output;
end;
run;
data two;
set one;

/* The MOD function returns the remainder from the division of */
/* argument-1 by argument-2. If the remainder is zero, when */
/* the second argument is 2, then the first argument must be */
/* even...therefore, output the observation. */

if mod(i,2)=0 then output;
run;

proc print;
run;

OUTPUT:
Obs i
1 2
2 4
3 6
4 8
5 10

source:www.support.sas.com

How to verify the existence of the external file:

Verify the existence of an external file

Conditionally execute code to read in a file only when the file exists.
Note: Although your operating environment utilities may recognize partial physical filenames, you must always use fully qualified physical filenames with FILEEXIST.
This example verifies the existence of an external file. If the file exists, read in the file using INFILE and INPUT statements. If the file does not exist, display a message in the SAS log that states the file does not exist.
Note that in a macro statement you do not enclose character strings in quotation marks.

/* If the file passed to the macro does exist, read in the file and create */
/* a character variable called VAR with a default length of 8 bytes. If */
/* file named in the macro call does not exist, write "FILE DOES NOT EXIST..." */
/* to the log. */

%macro in_file(file);
%if %sysfunc(fileexist(&file))=1 %then %do;
data a;
infile "&file";
input var $;
run;
%end;
%else %do;

data _null_;
put "FILE DOES NOT EXIST: &file";
run;
%end;
%mend;

/* Call the macro with file that does not exist */
%in_file(c:\bloom.txt);
/* Call the macro with file that does exist */
%in_file(c:\tracks\x1.txt);

OUTPUT

LOG when file does not exist
FILE DOES NOT EXIST: c:\bloom.txt

LOG when the file does exist ...typical notes seen when a file is read

NOTE: The infile "c:\tracks\x1.txt" is:
File Name=c:\tracks\x1.txt,
RECFM=V,LRECL=256

NOTE: 2 records were read from the infile "c:\tracks\x1.txt".
The minimum record length was 14.
The maximum record length was 20.
NOTE: The data set WORK.A has 2 observations and 1 variables.

source:www.support.sas.com

Accurately Calculating Age with Only One Line of Code

direct link:

This tip was written by William Kreuter, a senior computer specialist at the University of Washington in Seattle. He has used SAS software in public health research since 1981, and now specializes in manipulating large data sets for the School of Public Health and the School of Medicine. He can be reached at billyk@u.washington.edu.

A frequent need of SAS software users is to determine a person's age, based on a given date and the person's birthday. Although no simple arithmetic expression can flawlessly return the age according to common usage, efficient code to do so can be written using SAS software's date functions. This article, by SAS software user William Kreuter, presents a one-line solution for this purpose.

Put SAS Date Functions to Work for You

Many kinds of work require the calculation of elapsed anniversaries. The most obvious application is finding a person's age on a given date. Others might include finding the number of years since any event has occurred, such as an index date for medical treatment or the start of a magazine subscription.

However, because of the way the modern Gregorian calendar is constructed, there is no straightforward arithmetic method that produces a person's age, stated according to common usage -- common usage meaning that a person's age should always be an integer that increases exactly on a birthday. (Persons born on a February 29 are a special case that will be addressed later.) It is often important to compute an age that exactly conforms to this usage, for example so that the data will be consistent with the age written on a medical record.

Exact computation of ages and elapsed anniversaries must take into account all rules governing lengths of months and lengths of years. While the rich set of functions and programming constructions available in the SAS DATA step language allows many possible solutions, this article presents a concise solution that relies on letting the SAS date functions do all the work.

What Doesn't Work

Often, SAS software users attempt to compute age using an expression such as:

age = (somedate - birth) / 365.25;

where somedate and birth are SAS date variables (or constants or expressions). Clearly this usually doesn't return an integer and therefore it is not stating an age according to colloquial usage. That problem can be addressed by:

age = floor( (somedate - birth) / 365.25);

Now we're at least getting integers. In fact, for most dates in a given year this statement does produce the correct result. But in most years, age will increment on the wrong day. To account for the Gregorian calendar's idiosyncrasies, some users make attempts such as:

age = floor( (somedate - birth) / 365.2422);

However, extending the denominator to any number of significant digits doesn't help. Astronomers define several kinds of "years" for various technical uses, but the Gregorian calendar uses a different concept of "year" in which there are always either 365 or 366 days. No algorithm of this kind perfectly models such an interval.

The Julian calendar, which was introduced in 46 BC, gave every fourth year 366 days. Because this slowly causes a discrepancy between the calendar and the seasons, Pope Gregory XIII proclaimed the Gregorian calendar in 1582. The new rule provided that every fourth year will have 366 days, except for years divisible by 100 but not 400. Thus the year 2000 will be a leap year, but 2100 will not.

How SAS Date Functions Can Help

As we've seen, the Gregorian calendar, and hence an integer count incremented on an anniversary date, cannot be modeled with simple arithmetic. A completely accurate approach requires coding all of the rules for which years are leap years and all the rules for the number of days in each month.

This is where SAS software's date functions help. Date functions such as intck and intnx have the needed rules built in. Particularly, intck returns intervals that correctly account for the idiosyncrasies of the Gregorian calendar. However, a little tweaking is necessary to get exactly what we need.

Because intck alone won't produce the number of years between successive anniversaries given an arbitrary birth date or starting date, a tweak is needed to find how old the person is in months. Then, simple arithmetic will turn this number into what we need -- years that always increment on the anniversary date.

Again, consistent with common usage, we want the number of months always to be an integer and we want it to increment exactly on the same day each month (or on the first day following a month that is too short for the same day to occur). Generally, the expression

intck('month',birth,somedate);

returns the number of times the first day of a month is passed between birth and somedate. An enhancement is needed to alter this into the number of times the same day of the starting month is passed. This simply consists of subtracting one month if the day number of somedate is earlier than the day number of birth. Although one could program this concept using a separate if-then statement, it can be calculated more concisely as a logical expression returning a 0 or 1 value. The 0 or 1 is then subtracted from the result of intck, as in the following example.

intck('month',birth,somedate) - (day(somedate) < day(birth));

This now gives exactly the correct number of months for any pair of dates.

A one-line solution

Converting months to years, we get:

age = floor((intck('month',birth,somedate)- (day(somedate) < day(birth))) / 12);

This can be conveniently set up as a macro:

%macro age(date,birth);

floor ((intck('month',&birth,&date)- (day(&date) < day(&birth))) / 12)

%mend age;

This is an example of how the macro is used in a SAS DATA step:

age = %age(somedate,birth);

For example, the following lines:

age = %age('28aug1998'd,'24mar1955'd);

put age=;

will cause the following message to be placed on the log:

AGE=43

When This Won't Work

There are only two instances where this approach might fail to yield the expected result.

1.The birthday is February 29, and during non-leap years the person celebrates the birthday on February 28. The solution described here would treat the birthday during non-leap years as March 1. In a random population this should affect at most one out of 1,461 persons, or less than 0.07 percent of the population. If desired, extra lines of code can accommodate this situation.

2.A person's age is to be calculated at a time in history when, in some particular country, the Gregorian calendar was not in use. Beginning with the earliest date that is valid in SAS software -- January 1, 1582 -- SAS software uses the Gregorian calendar. That is the year that France, Italy, Luxembourg, Portugal, and Spain replaced the Julian calendar with the Gregorian. (The Gregorian calendar was first implemented so that the day after October 4, 1582 was October 15, 1582. Nevertheless, SAS software recognizes 31 days in the month of October, 1582.) While the rest of Roman Catholic Europe switched shortly after1582, the United Kingdom and its colonies did not move to the Gregorian calendar until 1752. Many other countries switched even later, including the Soviet Union in 1918 and Greece in 1923. Some historic dates therefore might be handled in a misleading manner -- a problem which, it should be noted, is true of any use of SAS dates in such instances. Nevertheless, given likely scenarios, age will be computed correctly in every country and era.

Search a character expression for a string, specific character, or word:INDEX/INDEXC Functions

Choose appropriate INDEX function to find target strings, individual letters, or strings on word boundaries.

Note:
Sample 1 uses INDEX to search for the first occurrence of a 'word' anywhere in a string. If the string is not found, the result is zero.

Sample 2 uses INDEXC to locate the first occurence of any character specified in the excerpt. If no target is found, the result is zero.**

Sample 3 uses INDEXW to find the target excerpt in a string on a word boundary. If the word is not found, the result is zero.

RESULTS:

Sample 1: INDEX

Sample 2: INDEXC

Sample 3: INDEXW

source:www.support.sas.com

How to use the MISSING function when you don't know if the variable is characer or numeric

Using the MISSING function from SAS Functions by Example By Ron Cody

Ever need to check for a missing value, but you're not sure if the variable is character or numeric? No problem when you use the MISSING function. This function takes either character or numeric variables and it checks for the .A, .B, ._ numeric missing values as well.

For example:

DATA MISSING;
INPUT CHAR $ X Y;
IF MISSING(CHAR) THEN N_CHAR + 1;
IF MISSING(X) THEN N_X + 1;
IF MISSING(Y) THEN N_Y + 1;
DATALINES;
CODY 5 6
. . .
WHATLEY .A ._
LAST 10 20
;

PROC PRINT DATA=MISSING NOOBS;
TITLE "Listing of MISSING";
RUN;A listing of MISSING, below, shows that the MISSING function works correctly with character and numeric values, including all the alternative numeric missing values:

Listing of MISSING

source: www.support.sas.com

HOW TO USE THE SCAN FUNCTION:

USING THE SCAN FUNCTION:

SCAN(string,n,delimiters): returns the nth word from the character string string, where words are delimited by the characters in delimiters.

It is used to extract words from a character value when the relative order of words is known, but their starting positions are not.

NewVar=SCAN(string,n<,delimiters>); -returns the nth ‘word’ in the string

When the SCAN function is used:
􀂃 the length of the created variable is 200 bytes if it is not previously defined with a LENGTH statement
􀂃 delimiters before the first word have no effect When the SCAN function is used,
􀂃 any character or set of characters can serve as delimiters

Points to remember while using SCAN Function:
􀂃 a missing value is returned if there are fewer than n words in string

􀂃 two or more contiguous delimiters are treated as a single delimiter
􀂃 if n is negative, SCAN selects the word in the character string starting from the end of string.

􀂃 If you omit delimiters , default is blank . < ( + & ! $ * ) ; ^ - / , %
Source: http://www.biostat.jhsph.edu/bstcourse/bio632/SummerInst/Class2/class2.pdf

Example of use:

Name= 'StudySAS, Blog';

Last= scan(Name,2,','); *Note : Comma ',' is Delimiter here;

Results in Last returns to Blog

Using the SCAN function from SAS Functions by Example:

Suppose you want to produce an alphabetical list by last name, but your NAME variable contains FIRST, possibly a middle initial, and LAST name. The SCAN function makes quick work of this. Note that the LAST_NAME variable in PROC REPORT has the attribute of ORDER and NOPRINT, so that the list is in alphabetical order of last name but all that shows up is the original NAME variable in First, Middle, and Last name order.

DATA FIRST_LAST;
INPUT @1 NAME $20.@21 PHONE $13.;

***Extract the last name from NAME;

LAST_NAME = SCAN(NAME,-1,' '); /* Scans from the right */
DATALINES;
Jeff W. Snoker (908)782-4382
Raymond Albert (732)235-4444
Steven J. Foster (201)567-9876
Jose Romerez (516)593-2377
;

PROC REPORT DATA=FIRST_LAST NOWD;
TITLE "Names and Phone Numbers in Alphabetical Order (by Last Name)";
COLUMNS NAME PHONE LAST_NAME;
DEFINE LAST_NAME / ORDER NOPRINT WIDTH=20;
DEFINE NAME / DISPLAY 'Name' LEFT WIDTH=20;
DEFINE PHONE / DISPLAY 'Phone Number' WIDTH=13 FORMAT=$13.;
RUN;

source: www.support.sas.com

How to use the PROPCASE function

Using the PROPCASE function

The "old" way to capitalize the first letter of words was to use LOWCASE, UPCASE, and the SUBSTR function, like this:

DATA CAPITALIZE;
INFORMAT FIRST LAST $30.;
INPUT FIRST LAST;
FIRST = LOWCASE(FIRST);
LAST = LOWCASE(LAST);

SUBSTR(FIRST,1,1) = UPCASE(SUBSTR(FIRST,1,1));
SUBSTR(LAST,1,1) = UPCASE(SUBSTR(LAST,1,1));

DATALINES;
ronald cODy
THomaS eDISON
albert einstein
;
PROC PRINT DATA=CAPITALIZE NOOBS;
TITLE "Listing of Data Set CAPITALIZE";
RUN;

With the PROPCASE function in SAS 9.1, it's much easier.

DATA PROPER;
INPUT NAME $60.;
NAME = PROPCASE(NAME);
DATALINES;
ronald cODy
THomaS eDISON
albert einstein
;
PROC PRINT DATA=PROPER NOOBS;
TITLE "Listing of Data Set PROPER";
RUN;

source:www.support.sas.com

Example:

data test;
x=lowcase('MY NaMe iS SARaTh');
y=propcase(x);
z=propcase(lowcase('I AM A PROGRAMMER'));
a=lowcase(z);
b=upcase(x);
run;

How to capitalize the first letter of every word in a string

Capitalize the first letter of every word in a string

Convert a text string into mixed case.
Note: Beginning in SAS 9.1, this task is easily accomplished with the PROPCASE function. See Sample 2 on the Full Code tab.

/* Sample 1: COMPBL, LOWCASE, SCAN, INDEX, UPCASE, SUBSTR */

data sample;
input name $char50.;
/* Lowercase the entire string, remove consecutive blanks */
newname=compbl(lowcase(name));
length next $ 20;
i=0;
next=scan(newname,1,' ');
do while(next ne ' ');
i+1;

/* Scan off each 'word' based upon a space, locate the position */
/* of the first letter in the original string, UPCASE the first */
/* letter and use SUBSTR to replace the byte. */

pos=indexw(newname,trim(next));
substr(newname,pos,1)=upcase(substr(newname,pos,1));
next=scan(newname,i,' ');
end;
keep name newname;
datalines;
Jane DOE
min ning chou
HENRIK HANSSON
D ETCHEVERRY, Charo B
;
proc print;
run;

/* Sample 2: PROPCASE (available in SAS 9.1) */
data sample;
input name $char50.;
mixed=propcase(name," ',");
datalines;
tom o'shea
min ning chou
LENA HANSSON
murder, she wrote
;
proc print;
run;

OUTPUT:

source:support.sas.com

SOUNDEX function

Combine data sets based upon similar values

Encode character strings using SOUNDEX to aid in combining the data based upon similar but not exact values.

Note: The SOUNDEX algorithm is English-biased. For more details about SOUNDEX, please refer to the SAS Language Reference, Dictionary under Functions.

RESULT:

source: support.sas.com

Options in SAS' INFILE Statement

Options in SAS' INFILE Statement

There are a number of options available for the INFILE statement. Below you will find discussion of the following options: DLM='character', DSD, MISSOVER, and FIRSTOBS=value.

DLM='character'

When I prepare a data file for list input to SAS, I use a blank space as the delimiter. The delimiter is the character which must appear between the score for one variable and that for the next variable. One can, however, choose to use a delimiter other than a blank space. For example, the comma is a commonly used delimiter. If you are going to use a delimiter other than a blank space, you must tell SAS what the delimiter is.

Here is an example of a couple of data lines in a comma delimited file:

4,2,8010,2,4,2,4,4,2,2,2,2,2,2,4,4,2,4,2,2,CDFR,22,900,5,4,1
4,2,8011,1,2,3,1,3,4,4,4,1,2,2,4,2,3,4,3,1,psychology,24,360,4,3,1

Here is the INFILE statement which identified the delimiter as being a comma:
infile 'd:\Research-Misc\Hale\Hale.csv' dlm=',' dsd;

DSD

DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:

Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.

MISSOVER
I was reading a data file (from mainland China) which was very messy to read with list input, as it was not only comma delimited, but some subjects were missing data at the end of the data records, without commas marking the fields with missing data. I had to use the MISSOVER option, which prevents SAS from going to a new input line when it does not find values in the current line for some of the variables declared in the input statement. With the MISSOVER option, when SAS reaches the end of the current record, variables without any values assigned are set to missing.

Here is the INFILE statement I used:
data china; infile 'D:\Chia\beij93 data *' missover dlm=',';

FIRSTOBS=value
Sometimes the data file will have nondata on the first n lines of the file. You can use the FIRSTOBS command to tell SAS where to start reading the file. For example, consider the following infile statement:
infile 'D:\StatData\dlm.txt' dlm=',' dsd firstobs=7;

The data started on the seventh line of the file, so FIRSTOBS=7 was used to skip over the first six lines. The delimiter in the original file was a tab character, but I was unable to figure out how to use INFILE to set the delimiter to the tab character, so I used Word to replace every tab with a comma.

source: www.core.ecu.edu

Finding the number of observations in SAS dataset

There are a number of ways of finding out the number of observations in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter a pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname="WORK" and memname="TEST";
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx("nobs",nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option and you don’t need a SAS/AF licence to do it either.

%LET DSID=%SYSFUNC(OPEN(WORK.TEST,IN));
%LET NOBS=%SYSFUNC(ATTRN(&DSID,NOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

The first line opens the data set and the last one closes it; this is needed because you are not using data step or SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

source: technologytales.com

Another Simple way of doing this is to take help from PROC SQL automatic macro variable SQLOBS. Proc SQL automatically creates SALOBS macro variable, when it runs a program. SQLOBS macro variable will have the number of observations count of last proc SQL statement.

Here is how...

The following code will find the number of observations in the dataset AE in the Library name called SAS.

Note: Don't use noprint option in the Proc SQL statement.

libname sas 'C:\Users\Sarath Annapareddy\Desktop\*******;

proc sql;
select * from sas.ae;
quit;
%put SQLOBS=&sqlobs;

9 %put SQLOBS=&sqlobs;
SQLOBS=224

You can also get the number of observations value using Proc Contents. Here is how…

proc contents data=work.dataset out=nobs(where=(varnum=1)keep=memname nobs varnum)noprint;
run;

Convert values from character to numeric or from numeric to character.

Convert values from character to numeric or from numeric to character\Convert variable values using either the INPUT or PUT function.
Convert a character value to a numeric value by using the INPUT function. Specify a numeric informat that best describes how to Read the data value into the numeric variable. When changing types a new variable name is required. If you need to keep the original variable name for the new type, use the RENAME= option as illustrated in Sample 2.

data char;
input string :$8. date :$6.;
numeric=input(string,8.);
sasdate=input(date,mmddyy6.);
format sasdate mmddyy10.;
datalines;
1234.56 031704
3920 123104;

proc print;
run;

data now_num;
input num date: mmddyy6.;
datalines;
123456 110204
1000 120504;
run;

data now_char;
set now_num (rename=(num=oldnum date=olddate));
num=put(oldnum,6. -L);
date=put(olddate,date9.);
run;

proc print;
run;

Source: support.sas.com

Here is how to convert Numeric date with yymmdd8. format to mmddyy10. format.

data datec;

input numdate;
date =input(put(numdate,z8.),yymmdd8.);
format date mmddyy10.;
datalines;
20040625
20081219
;
run;

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.

Discover More Tips and Techniques on This Blog

Disclosure: