Discover More Tips and Techniques on This Blog

How to Create a new data set from multiple data sets based upon sorted order

Create a new data set from multiple data sets based upon sorted order

Use the SET and BY statements to interleave data sets.

Note: Interleaving uses a SET statement and a BY statement to create a new, sorted data set from multiple data sets. The number of observations in the new data set is the sum of the number of observations in the original data sets. The observations in the new data set are arranged by the values of the BY variable(s) and, within each BY-Group, the order of the data sets in which they occur, including duplicates.

To interleave, data sets need to be in sorted order or indexed on the BY variables.



Output:
source: support.sas.com

Dynamically generate SET statement to combine multiple data sets

Dynamically generate SET statement to combine multiple data sets

You can manually enter the data set names or use MACRO logic to generate the repetitive data set names when combining many data sets on a SET statement.

Note: Variable name lists are not valid for data sets. In other words, if you have WORK.DS1, WORK.DS2, and WORK.DS3, you can NOT specify WORK.DS1-WORK.DS3 on the SET statement.

/* names use the naming convention of DSn, where n is an incrementing /* number.

data ds1;
x=1;
run;

data ds2;
x=2;
run;

data ds3;
x=3;
run;

/* Build a macro called NAMES with two parameters. The first parameter /
* is the 'prefix' of the naming pattern. The second parameter is the /
* maximum number of data sets you want to generate on the SET statement. */


%macro names(prefix,maxnum);
%do i=1 %to &maxnum;
&prefix&i
%end;
;

%mend names;

/* Call the macro on the SET statement */

data all;
set %names(DS,3);
run;

proc print data=all;
title "Appended results";
run;
run;

Appended results

Obs x
1 1
2 2
3 3

source: www.support.sas.com

How to determine which data set contributed an observation

Determine which data set contributed an observation

Use the IN= option to create a boolean variable that is set to one or 'true' to indicate whether the data set contributed data to the current observation. When the IN= variable's value is 1, assign the data set's name into a new variable.

data one;
input string $;datalines;
apple
banana
coconut; run;

data two;
input string $;
datalines;
anagram
bottle
clown
dog; run;

data combo;
set one(in=o) two(in=t);
if o then origin='one';
else origin='two';
run;

proc print data=combo;run;
run;

RESULTS:

Obs string origin
1 apple one
2 banana one
3 coconut one
4 anagram two
5 bottle two
6 clown two
7 dog two

source: www.support.sas.com

How can I convert a numeric date variable to a character variable using PROC SQL?

Helpful documents for Proc SQL:Inside PROC SQL's Query Optimizer


PROC SQL -- The Long and The Short of It

Performance Enhancements to PROC SQL in Version 7 of the SAS® System


Using the SAS/ACCESS Libname Technology to Get Improvements in Performance and Optimizations in SAS/SQL Queries

How can I convert a numeric date variable to a character variable using PROC SQL?The PUT function can be used within PROC SQL to create a character date string from a numeric SAS date value. data numdate;


******************************************************;
data numdate;
date=TODAY();
run;

proc sql;create table new as select PUT(date,date9.)
as newdate from numdate;
quit;

************************************************************;source:
www.support.sas.com
How can I convert a character date variable to a numeric date variable with PROC SQL?

The INPUT function can be used within PROC SQL to create a numeric date variable from a character date string.
****************************************************************;
data chardate;
date='08/30/2006';
run;


proc sql;
create table new as select INPUT(date,mmddyy10.)
as newdate format=mmddyy10. from chardate;
quit;


****************************************************************;
source: http://www.support.sas.com/ Usage Note 24527...

CLASS Statement

When should I put a variable in the CLASS statement? What does the CLASS statement do?

The CLASS statement is used to indicate which variables in the model are categorical variables. In most modeling procedures, such a variable is then treated as a nominal (unordered) categorical predictor variable. A set of numeric indicator ("dummy") variables is created internally to represent the levels of the variable. Because the indicator variables are used for fitting the model, the original variable does not need to be numeric. The resulting model has multiple parameter estimates (one for each indicator variable). Each parameter compares one level of the predictor with a reference level, typically the last level in sorted order. A joint test of all the estimated parameters for the predictor is a test for any differences among the levels and is therefore a test of the predictor's overall effect.

In contrast, a variable name that appears in the MODEL statement but not in the CLASS statement is treated as a continuous predictor variable. The variable itself is used in fitting the model. Therefore, the variable must be a numeric SAS variable and should be continuous or at least be ordered with assigned numeric scores. The resulting model typically has one parameter estimate (there might be more for models with multiple response variables or functions) that estimates the linear effect of the predictor.

Note that if the predictor was unordered, it would not be useful to test for its "linear" effect because you cannot talk about the effect of "increasing" an unordered variable. So, all nominal, categorical variables should be listed in the CLASS statement. On the other hand, you might choose to ignore the ordering in a continuous predictor variable and treat it as a nominal predictor by specifying it in the CLASS statement. But remember that a parameter will be added to the model for each additional level of the variable and this could result in a very large model if the variable has many distinct values in the data set.

Some procedures (such as PROC LOGISTIC) offer many options in the CLASS statement that enable you to designate how the internally generated variables are coded. Each coding method imposes a different interpretation on the estimated parameters. For instance, the GLM (or indicator or dummy) coding method that was mentioned earlier creates parameter estimates that compare the effect of each level to the effect of the reference level. Another coding method for nominal predictors is effects coding, which results in parameter estimates that compare the effect of each level to the average effect of all the levels. There is a coding method appropriate for variables that are ordinal but with unknown spacing between the levels. And there is a coding method for continuous variables that decomposes the variable's effect into linear, quadratic, cubic, and other components.

source:
www.support.sas.com

List of the Domains (datasets) and the variables in it: (CDISC perspective)

List of the Variables in Each Dataset:

TLF samples

TABLES:
Screening Failures

Subject Disposition

Subject Disposition by Visit

Premature Discontinuation from Study Medication

Subject Disposition by Center

Protocol Deviations

Demographics and Baseline Characteristics

Medical and Surgical History

Gynecological History

Screening Pap Smear, Mammography, and Serum Pregnancy Test Results

Number (%) of Subjects Who Took Concomitant Medication

Duration of Treatment in Days and Study Drug Compliance

Number (%) of Subjects with Treatment-Emergent Adverse Events by Body System and Preferred Term

Number (%) of Subjects with Common (>=2%) Treatment-Emergent Adverse Events by Preferred Term, sorted by Descending Order of Incidence in the Trt-x

Number (%) of Subjects with Treatment-Emergent Adverse Events by Body System, Preferred Term, and Maximum Intensity

Number (%) of Subjects with Treatment-Emergent Thromboembolic Events

Number (%) of Subjects with Drug-Related Treatment-Emergent Adverse Events by Body System and Preferred Term

Number (%) of Subjects with Treatment-Emergent Adverse Events Causing Discontinuation

Physical Exam - Number (%) of Subjects with Transitions from Baseline to End of Treatment

Vital Signs Values and Change from Baseline by Visit and Treatment Group

Hematology Values and Change from Baseline to Final Visit - Descriptive Statistics by Visit and Treatment

Hematology Values and Change from Baseline by Visit and Treatment Group

Number (%) of Subjects with Transitions from Baseline to Postbaseline Visits by Treatment

Lead ECG: Number (%) of Subjects with Transitions from Screening - Overall Interpretation by Visit

Continuous Response - Descriptive Statistics by Visit and Treatment

Analysis of Continuous Response

Responder Analysis

Continuous Response - Descriptive Statistics by Pooled Center


LISTINGS:

Listing of Hematology Parameters with Low and High Flags

Listing of Blood Chemistry Parameters with Low and High Flags

Subject Listing for Adverse Events/ Serious Adverse Events

Subjects with Death Information


FIGURES/GRAPH"s

Bar Chart of Change in Diastolic & Systolic Blood Pressure at Final Visit

Bar Chart of Urine Drug Concentration Post-Dose Collections at 0 to 72 Hours in PK Studies (Standard Deviation plotted over Vertical Bars using annotation data sets in SAS)

Plot of Systolic Blood Pressure at 0 to 24 hours After Dosing of Study Drug

Plot of Plasma Drug Concentration at 0 to 72 Hours After Dosing in PK Studies (Mean and S.D. plotted at each point using annotation data sets in SAS)

Kaplan-Meire Survival Curve

Power Curve for Study Design Sample Size

source: www.njbiostatistics.com

Different phases I-IV of a clinical trial



Clinical Trials/Studies in Humans
The clinical testing (investigation) of experimental drugs (previously unproven in humans, therefore "experimental") in humans is normally done in three phases (Phase I, II and III) with more and more people included in each subsequent phase. Before moving to the next phase of development the data are carefully analyzed to ensure the experimental drug is at least safe and well tolerated. After successful completion of Phase I-III testing, a company will submit the results of all of the studies to the FDA or TPD to obtain a New Drug Approval (NDA). Once the FDA or TPD grants a company with a NDA, the company can market the drug (medication) to the public. Additional testing (post-marketing or late phase III/phase IV) to look at the ongoing-term safety continues.

Phase I Studies

Phase I studies are primarily concerned with the drug's safety, and are the first time the drug is tested in humans. These studies are typically done in a small number of healthy volunteers (20-100), usually in a hospital setting where they can be closely watched and treated should there be any side effects. These volunteers are usually paid for their participation and for the most part tend to be men approximately 30 years of age on average. (Women and children would be involved only in latest phases of clinical trial and only if substance in question is designed to be used in this groups of population.) The purpose of Phase I studies is to determine how the experimental drug is absorbed, metabolized, and excreted in humans. Additionally, they seek to determine what types of side effects occur as the dosage of the drug is increased. Any beneficial effects of the drug are also noted. Phase I studies test a particular treatment in humans after it has been studied in the laboratory. The purpose of Phase I studies is to determine the maximum tolerated dose or amount of the treatment and answer questions about the best way to give the new treatment. Drugs that can cause serious side effects are not tested on healthy humans, for example drugs for treating cancer. Phase I of these studies is carefully controlled by Cancer Therapy Evaluation Programs.

The following pre-clinical studies must be completed before phase 1 studies can begin in the United States.


Single dose toxicity in two mammalian species.Safety pharmacology studies to include assessment of effects on vital functions.Pharmacokinetic studies (ADME)Repeated dose toxicity studies in two species (one non-rodent) for two to four weeks, providing phase 1 studies will not exceed two weeks.Local tolerance studies using route of administration relative to propose clinical administration.In vitro tests for evaluation of mutations and chromosomal damage (genotoxicity)Carcinogenicity studies (only if there is cause for concern)

Source: Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals, U.S. Department of Health and Human Services and the FDA (CDER and CBER), July 1997, ICH.

Phase II Studies

Once Phase I studies have been completed and a dosage level is known, Phase II studies can start. Once an experimental drug has been proven to be safe and well tolerated in healthy volunteers, it must be tested in the patients that have the disease or condition that the experimental drug is expected to improve/cure. In addition to ensuring that the experimental drug is safe and effective in the patient population of interest, Phase II studies are also designed to evaluate the effectiveness of the drug. The second phase of testing may last from several months to a few years and may involve up to several hundred patients. Most Phase II studies are well controlled, randomized trials. That is, one group of patients (subjects) receives the experimental drug, while a second "control" group receives a standard treatment or placebo. Placement of the subject into the drug treatment or placebo group is by random chance (as if by the flip of a coin). Often these studies are "double-blinded", that is, the patient nor the researchers (investigator, coordinator, etc.) know who is getting the experimental drug. Additionally, Phase II studies are often designed to determine the correct dosage, that is the dosage with the least number of side effects that is most effective. These are often referred to as dose-ranging studies. In general, the purpose of Phase II studies is to provide the pharmaceutical company and the FDA in USA and TPP/TPD in Canada with comparative information about the relative safety of the experimental drug, the proper dosage needed to treat the condition, and the drug's effectiveness. Only about one-third of experimental drugs successfully complete both Phase I and Phase II testing.

Pre-clinical requirements before initiating phase II studies in the U.S.:Repeated dose toxicity studies in two species (one non-rodent) for a period of time equivalent to the length of the phase II studies. Six-month rodent and chronic non-rodent studies will support clinical trials of six months’ duration in the U.S. Studies of longer treatment duration are supported by nine- to twelve-month long pre-clinical studies.

Source: Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals, U.S. Department of Health and Human Services and the FDA (CDER and CBER), July 1997, ICH.
Phase III Clinical Studies
If the treatment is found to be effective, Phase III studies compare it to the standard treatment. This is done by having two or more "arms" of treatment in which patients are randomly selected to participate. The arm in which the patient participates is decided by chance (by a computer), not choice. This randomization assists in making the groups as equal as possible so that sound conclusions can be drawn from study results. Patients are randomized by a number of factors that may affect the outcome of the study (age, performance status, stage of disease, etc.). In all treatment arms, patients should receive the best care available. The Data Monitoring Committee oversees all Phase III studies conducted by Sponsor. In a Phase III study, an experimental drug is tested in several hundred to several thousand patients with the disease/condition of interest. Most Phase III studies continue to be randomized and blinded. The large-scale testing provides the pharmaceutical company as well as the FDA with a more thorough understanding of the drug's effectiveness, benefits/risks, and range/severity of possible adverse side effects. Phase III studies typically last several years. Seventy to 90 percent of drugs that enter Phase III studies successfully complete this phase of testing.

Pre-clinical requirements for initiation of phase III studies in the U.S.:

Repeated dose toxicity studies in two species (one non-rodent) for a period of time equivalent to the length of the phase III studies. Six-month rodent and chronic non-rodent studies would support clinical trials exceeding six months.Carcinogenicity studies if the duration of treatment of the drug is expected to be six months or longer or if intermittent exposure is equal to six months of continuous exposure, or if there is a cause for concern. Carcinogenicity studies are not required if the patients receiving the drug have life expectancy of less than two years.Fertility studies in males.Repeated dose toxicology studies that include an evaluation of female reproductive organs must be done if women of non-childbearing potential are used.Assessment of female fertility and embryo-fetal development if women of childbearing potential will be included.All reproduction toxicity studies and the standard and the standard genotoxicity tests should be completed if pregnant women will be included.

Source: Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals, U.S. Department of Health and Human Services and the FDA (CDER and CBER), July 1997, ICH.

Phase IV Marketing of New Drugs and Post - Marketing Surveillance
After successful completion of Phase I-III testing, a company submits the results of all of the studies to the FDA to obtain a New Drug Application (NDA). Once the FDA grants a company with a NDA, the company can market the drug (medication) to the public. Additional testing (post-marketing or late phase III/phase IV) to look at the long-term safety continues. Kriger Research Center has a couple of full scale independent projects and studies on several groups of new drugs in post-marketing stage. FDA (in USA) or TPD (in Canada) may require that sponsor would do a long term safety study - epidemiological post-marketing surveillance, as a condition of approval, These may be required because there have been seen problems with similar compounds in the past, or because the compound is novel and additional safety information will be beneficial.

Sourse: www.krctraining.com


Clinical Trials

A clinical trial is designed to answer specific questions about new drugs, medical devices, or new ways of using known treatments.

Clinical trials are used to determine whether the new drug or treatment is safe, and whether it works.

Clinical trials consist of four phases:

Phase I tests a new treatment on a small group, and concentrates on safety;

Phase II deals with safety and efficacy, and expands the study to a larger group of people (several hundred);

Phase III expands the study to an even larger group of people (thousands), and is designed to determine conclusively whether or not the treatment is effective;

Phase IV takes place after the drug has been licensed, to monitor the drug for long-term effects.

The randomized, double-blind, placebo-controlled (or active-comparator-controlled) trial offers the strongest evidence that a treatment is effective. The number of participants also considerably effects how reliably the trial can determine the effects of a treatment.

Clinical trials must be consistent with good clinical practice (GCP), a rigorous set of guidelines designed to protect the participants’ safety and the integrity of the trial data. The FDA requires pharmaceutical companies and contract research organizations to conduct rigorous clinical trials verifying the safety and efficacy of the new drugs before granting approval for marketing.

The trial objectives and design are usually documented in clinical trial protocols. Once the objectives are determined, case report forms must be carefully designed to gather complete, unambiguous data from the trial.

During the trial, the data management team must continually monitor and verify the data to ensure that they are accurate and consistent. Any missing or inconsistent data must be investigated and corrected.

source: www.entrypointplus.com


Phase (in relation to Drug Development): Drug development is divided into phases that are determined by the main objectives of the drug development process.
  • Preclinical: Laboratory or animal studies to show biological activity of the compound against the targeted disease, with the compound evaluated for safety and possible formulations.
  • Phase 1 : A Phase 1 clinical trial is the first step in testing a new investigational medication (or new use of a marketed drug) in humans. Phase 1 studies are mainly concerned with evaluating a drug’s safety profile, including the safe dosage range. The studies also determine how the drug is absorbed and broken down by the body, what is the best way to give the drug to a patient (for example by mouth, or by injection), what side effects may be likely, and how the drug is absorbed, distributed, metabolized, and excreted as well as its duration of action. Except for drugs used to treat cancer, Phase 1 clinical trials are usually conducted in healthy individuals and are not intended to treat disease or illness. Because cancer can be such a life-threatening condition, Phase 1 trials with anti-cancer drugs are usually carried out in patients who already have the disease.
  • Phase 1b : Phase 1b studies are usually conducted in patients diagnosed with the disease, or condition for which the study drug is intended, who demonstrate some biomarker, surrogate, or possibly clinical outcome that could be considered for "proof of concept." Proof of concept in a Phase 1b study typically confirms the hypothesis that the current prediction of biomarker, or outcome benefit is compatible with the mechanism of action.
  • Phase 1/2 : Phase 1/2 trials combine a Phase 1 and a Phase 2 trial of the same treatment into a single protocol. First the Phase 1 part of the trial is done - to determine the Maximum Tolerable Dose (MTD). Then, more patients are treated at the MTD, in the Phase 2 portion of the study, to further evaluate safety and/or efficacy.
  • Phase 2 : Phase 2 clinical trials involve volunteers who have the disease or condition to be treated. These trials help physicians and researchers begin to learn more about the safety of the new drug treatment and how well the drug treats the targeted disease or condition. Several different doses of the drug may be looked at to see which dose has the desired effects. Patients are monitored for side effects and for any improvement in their illness, symptoms, or both.
  • Phase 3 : After a drug has been shown to have positive results in small groups of patients, it may be studied in a larger Phase 3 trial to confirm efficacy and identify adverse events from long-term use. A Phase 3 trial usually compares how well the study drug works compared with an inactive placebo and/or another approved medication. One group of patients may receive the new drug being tested, while another group of patients may receive the comparator drug (already-approved drug for the disease being studied), or placebo.
  • Phase 4 : Phase 4 clinical trials are sometimes called "post-marketing" trials because these studies begin after the Phase 1 – 3 study results have been given to the FDA for evaluation. These studies may be done to determine if the drug is effective against other disease states, or to test different ways of taking the drug such as tablets, time-release capsules or syrups, or to look for adverse events in larger populations over longer periods of time.
Investigational New Drug Application (IND): After completing preclinical testing, a company files an IND with the U.S. Food and Drug Administration to begin to test the drug in humans. The IND becomes effective if FDA does not disapprove it within 30 days. The IND shows results of previous experiments; how, where and by whom the new studies will be conducted; the chemical structure of the investigational drug; how it is thought to work in the human body; any toxic effects found in the animal studies; and how the compound is manufactured. All clinical trials must be reviewed and approved by the Institutional Review Board (IRB) where the trials will be conducted (in Europe, the Ethics Review Board, or ERB).

New Drug Application (NDA): Following the completion of all three phases of clinical trial development, a company analyzes all of the data and files an NDA with the U.S. Food and Drug Administration if the data successfully demonstrate both safety and effectiveness. The NDA contains all the scientific information that the company has gathered on the investigational drug. NDAs typically run 100,000 pages or more and can take up to a year or longer to review by the FDA, on average. Once FDA approves an NDA, the new medicine becomes available for physicians to prescribe.

Open-label Study: A study in which the investigator and the study participant are aware of the drug therapy received during the study.

Single-Blind Study: A study in which the treatment group assignment is not revealed to the study participant but is known by the investigator.

Double-Blind Study: A clinical trial in which neither the investigator nor the study participant is aware of the treatment received.
Unblinding: The act of providing visual or verbal access to the study participant, study drug treatment.
Washout: A period of time during a clinical study when a participant is taken off of a study drug or a medication that is not allowed during the study.
Food and Drug Administration (FDA): A branch of the U.S. Department of Health and Human Services primarily responsible for regulating the approval and use of drugs, medical devices, cosmetics, and foods.
Good Clinical Practice (GCP): A standard for the design, conduct, performance, monitoring, auditing, recording, analysis, and reporting of clinical trials that provides assurance that the data and reported results are credible and accurate, and that the rights, integrity, and confidentiality of trial participants are respected and protected.
Institutional Review Board, or IRB: Also known in some countries as Ethics Review Board, or ERB. This is the scientific institution that is independent of the sponsor of the clinical trial that reviews and approves the study protocol on ethical treatment grounds and ensures that each participant enrolled in the trial has given their informed consent to participate. The IRB or ERB monitors the clinical trial from inception through completion.




Source: http://www.lillytrials.com/docs/terminology.html#P

How to Read Delimited Text Files into SAS

This sample program shows you how to read a delimited text file into SAS. A text file is often referred as raw data that can be prepared in a variety of formats (e.g., csv [comma-separated values], tab-delimited, or spacedelimited). When reading text files with the DATA step,
two statements are used: INFILE and INPUT statements. The INFILE statement is used to specify the physical file being read.

You can use a FILENAME statement in conjunction with an INFILE statement (see Example.1).
Or, you can specify the full path to the file in the INFILE statement (see example.2). A set of options are available in the INFILE statement: DLM, DSD, and LRECL.

The DLM option allows you to tell SAS what character is used as the delimiter in the text file. If this option is not specified, SAS assumes the delimiter is a space. Common delimiters include commas, vertical pipes, semi-colons, and tabs.

The DSD option has three functions when reading delimited files.

The first function is to remove any quotes that surround values in the text file.

The second function deals with missing values. When SAS encounters consecutive delimiters in a file, the default action is to treat the delimiters as one unit. If a file has consecutive delimiters, it is usually because there are missing values between them. DSD tells SAS to treat consecutive
delimiters separately; therefore, a value that is missing between consecutive delimiters will be read as a missing value when DSD is specified.

The third function assumes the delimiter is a comma. If DSD is specified and the delimiter is a comma, the DLM option is not necessary. The LRECL option should be used when the
length of the records in a file is greater than 256 bytes. The input buffer is 256 bytes by default, and records that are greater than this length will be truncated when they are
read.

(Source: SAS Technical Support DOC: TS-673).

[Example 1.]

FILENAME myfile “/full/path/testfile.csv” DLM=’,’
LRECL=400;
DATA a;
INFILE myfile;
INPUT var1 var2 var3;
DATALINES;

[Example 2.]

DATA a;
INFILE “/full/path/testfile.csv” DLM=’,’ LRECL=400;
INPUT var1 var2 var3;
DATALINES;

EXAMPLE:
DATA a;
INFILE datalines DLM=’,’ DSD;
INPUT sex $ age educ;
DATALINES;
F, 40, 20
M, , 16
M, 35, 16
F, ,
, 50, 16
F, 45,
M, 50, 12
M, 42, 16
F, 36, 20
M, 38, 9
;
PROC PRINT;
Here are the results of PROC PRINT:
Obs sex age educ
1 F 40 20
2 M . 16
3 M 35 16
4 F . .
5 50 16
6 F 45 .
7 M 50 12
8 M 42 16
9 F 36 20
10 M 38 9


Courtesy: Created by Hui-shien Tsao of Center for Social and Demographic Analysis, 04/17/02

How to Determine if a variable exists in the dataset or not

Programatically determine if a variable exists in a data set

Use the OPEN and VARNUM functions to determine if a variable exists in a SAS data set. The OPEN function opens a SAS data set and returns a data set identifier. The VARNUM function returns the variable's position in the data set. If the variable does not exist then VARNUM returns a 0.

/* Create sample data */
data test;
input fruit $ count;
datalines;
apple 12
banana 4
coconut 5
date 7
eggs 9
fig 5
;
/* Specify the data set to open in the OPEN function. Specify the */
/* variable to be located in the second parameter of the VARNUM */
/* function. If the variable does not exist, the value of check */
/* will be zero. */

data _null_;
dsid=open('test');
check=varnum(dsid,'count');
if check=0 then put 'Variable does not exist';
else put 'Variable is located in column ' check +(-1) '.';
run;

OUTPUT:

/* Partial LOG output */
311 data _null_;
312 dsid=open('test');
313 check=varnum(dsid,'count');
314 if check=0 then put 'Variable does not exist';
315 else put 'Variable is located in column ' check +(-1) '.';
316 run;
Variable is located in column 2.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

source:www.support.sas.com

What to do when we want only the even number observations

Output only even number observations.

Note: The MOD function returns the remainder when values are divided. In this sample, when dividing i by 2, there will be no remainder for the even observations (2,4,6,8 and 10). If there is a remainder, the current observation has to be an odd numbered observation. This program outputs only the even numbered observations to a new data set.

/* Create sample data set */
data one;
do i=1 to 10;
output;
end;
run;
data two;
set one;

/* The MOD function returns the remainder from the division of */
/* argument-1 by argument-2. If the remainder is zero, when */
/* the second argument is 2, then the first argument must be */
/* even...therefore, output the observation. */


if mod(i,2)=0 then output;
run;

proc print;
run;

OUTPUT:
Obs i
1 2
2 4
3 6
4 8
5 10

source:www.support.sas.com

How to verify the existence of the external file:

Verify the existence of an external file

Conditionally execute code to read in a file only when the file exists.
Note: Although your operating environment utilities may recognize partial physical filenames, you must always use fully qualified physical filenames with FILEEXIST.
This example verifies the existence of an external file. If the file exists, read in the file using INFILE and INPUT statements. If the file does not exist, display a message in the SAS log that states the file does not exist.
Note that in a macro statement you do not enclose character strings in quotation marks.


/* If the file passed to the macro does exist, read in the file and create */
/* a character variable called VAR with a default length of 8 bytes. If */
/* file named in the macro call does not exist, write "FILE DOES NOT EXIST..." */
/* to the log. */


%macro in_file(file);
%if %sysfunc(fileexist(&file))=1 %then %do;
data a;
infile "&file";
input var $;
run;
%end;
%else %do;

data _null_;
put "FILE DOES NOT EXIST: &file";
run;
%end;
%mend;

/* Call the macro with file that does not exist */
%in_file(c:\bloom.txt);
/* Call the macro with file that does exist */
%in_file(c:\tracks\x1.txt);

OUTPUT

LOG when file does not exist
FILE DOES NOT EXIST: c:\bloom.txt

LOG when the file does exist ...typical notes seen when a file is read

NOTE: The infile "c:\tracks\x1.txt" is:
File Name=c:\tracks\x1.txt,
RECFM=V,LRECL=256

NOTE: 2 records were read from the infile "c:\tracks\x1.txt".
The minimum record length was 14.
The maximum record length was 20.
NOTE: The data set WORK.A has 2 observations and 1 variables.

source:www.support.sas.com

Accurately Calculating Age with Only One Line of Code


This tip was written by William Kreuter, a senior computer specialist at the University of Washington in Seattle. He has used SAS software in public health research since 1981, and now specializes in manipulating large data sets for the School of Public Health and the School of Medicine. He can be reached at billyk@u.washington.edu.

A frequent need of SAS software users is to determine a person's age, based on a given date and the person's birthday. Although no simple arithmetic expression can flawlessly return the age according to common usage, efficient code to do so can be written using SAS software's date functions. This article, by SAS software user William Kreuter, presents a one-line solution for this purpose.
Put SAS Date Functions to Work for You

Many kinds of work require the calculation of elapsed anniversaries. The most obvious application is finding a person's age on a given date. Others might include finding the number of years since any event has occurred, such as an index date for medical treatment or the start of a magazine subscription.

However, because of the way the modern Gregorian calendar is constructed, there is no straightforward arithmetic method that produces a person's age, stated according to common usage -- common usage meaning that a person's age should always be an integer that increases exactly on a birthday. (Persons born on a February 29 are a special case that will be addressed later.) It is often important to compute an age that exactly conforms to this usage, for example so that the data will be consistent with the age written on a medical record.

Exact computation of ages and elapsed anniversaries must take into account all rules governing lengths of months and lengths of years. While the rich set of functions and programming constructions available in the SAS DATA step language allows many possible solutions, this article presents a concise solution that relies on letting the SAS date functions do all the work.

What Doesn't Work

Often, SAS software users attempt to compute age using an expression such as:

age = (somedate - birth) / 365.25;

where somedate and birth are SAS date variables (or constants or expressions). Clearly this usually doesn't return an integer and therefore it is not stating an age according to colloquial usage. That problem can be addressed by:

age = floor( (somedate - birth) / 365.25);

Now we're at least getting integers. In fact, for most dates in a given year this statement does produce the correct result. But in most years, age will increment on the wrong day. To account for the Gregorian calendar's idiosyncrasies, some users make attempts such as:

age = floor( (somedate - birth) / 365.2422);

However, extending the denominator to any number of significant digits doesn't help. Astronomers define several kinds of "years" for various technical uses, but the Gregorian calendar uses a different concept of "year" in which there are always either 365 or 366 days. No algorithm of this kind perfectly models such an interval.

The Julian calendar, which was introduced in 46 BC, gave every fourth year 366 days. Because this slowly causes a discrepancy between the calendar and the seasons, Pope Gregory XIII proclaimed the Gregorian calendar in 1582. The new rule provided that every fourth year will have 366 days, except for years divisible by 100 but not 400. Thus the year 2000 will be a leap year, but 2100 will not.

How SAS Date Functions Can Help

As we've seen, the Gregorian calendar, and hence an integer count incremented on an anniversary date, cannot be modeled with simple arithmetic. A completely accurate approach requires coding all of the rules for which years are leap years and all the rules for the number of days in each month.

This is where SAS software's date functions help. Date functions such as intck and intnx have the needed rules built in. Particularly, intck returns intervals that correctly account for the idiosyncrasies of the Gregorian calendar. However, a little tweaking is necessary to get exactly what we need.

Because intck alone won't produce the number of years between successive anniversaries given an arbitrary birth date or starting date, a tweak is needed to find how old the person is in months. Then, simple arithmetic will turn this number into what we need -- years that always increment on the anniversary date.

Again, consistent with common usage, we want the number of months always to be an integer and we want it to increment exactly on the same day each month (or on the first day following a month that is too short for the same day to occur). Generally, the expression

intck('month',birth,somedate);

returns the number of times the first day of a month is passed between birth and somedate. An enhancement is needed to alter this into the number of times the same day of the starting month is passed. This simply consists of subtracting one month if the day number of somedate is earlier than the day number of birth. Although one could program this concept using a separate if-then statement, it can be calculated more concisely as a logical expression returning a 0 or 1 value. The 0 or 1 is then subtracted from the result of intck, as in the following example.

intck('month',birth,somedate) - (day(somedate) < day(birth));

This now gives exactly the correct number of months for any pair of dates.

A one-line solution

Converting months to years, we get:

age = floor((intck('month',birth,somedate)- (day(somedate) < day(birth))) / 12);

This can be conveniently set up as a macro:

%macro age(date,birth);
floor ((intck('month',&birth,&date)- (day(&date) < day(&birth))) / 12)
%mend age;

This is an example of how the macro is used in a SAS DATA step:

age = %age(somedate,birth);

For example, the following lines:

age = %age('28aug1998'd,'24mar1955'd);
put age=;
will cause the following message to be placed on the log:

AGE=43

When This Won't Work

There are only two instances where this approach might fail to yield the expected result.

1.The birthday is February 29, and during non-leap years the person celebrates the birthday on February 28. The solution described here would treat the birthday during non-leap years as March 1. In a random population this should affect at most one out of 1,461 persons, or less than 0.07 percent of the population. If desired, extra lines of code can accommodate this situation.

2.A person's age is to be calculated at a time in history when, in some particular country, the Gregorian calendar was not in use. Beginning with the earliest date that is valid in SAS software -- January 1, 1582 -- SAS software uses the Gregorian calendar. That is the year that France, Italy, Luxembourg, Portugal, and Spain replaced the Julian calendar with the Gregorian. (The Gregorian calendar was first implemented so that the day after October 4, 1582 was October 15, 1582. Nevertheless, SAS software recognizes 31 days in the month of October, 1582.) While the rest of Roman Catholic Europe switched shortly after1582, the United Kingdom and its colonies did not move to the Gregorian calendar until 1752. Many other countries switched even later, including the Soviet Union in 1918 and Greece in 1923. Some historic dates therefore might be handled in a misleading manner -- a problem which, it should be noted, is true of any use of SAS dates in such instances. Nevertheless, given likely scenarios, age will be computed correctly in every country and era.

Search a character expression for a string, specific character, or word:INDEX/INDEXC Functions

Choose appropriate INDEX function to find target strings, individual letters, or strings on word boundaries.

Note:
Sample 1 uses INDEX to search for the first occurrence of a 'word' anywhere in a string. If the string is not found, the result is zero.

Sample 2 uses INDEXC to locate the first occurence of any character specified in the excerpt. If no target is found, the result is zero.**

Sample 3 uses INDEXW to find the target excerpt in a string on a word boundary. If the word is not found, the result is zero.






RESULTS:

Sample 1: INDEX



Sample 2: INDEXC



Sample 3: INDEXW



source:www.support.sas.com

How to use the MISSING function when you don't know if the variable is characer or numeric

Using the MISSING function from SAS Functions by Example By Ron Cody

Ever need to check for a missing value, but you're not sure if the variable is character or numeric? No problem when you use the MISSING function. This function takes either character or numeric variables and it checks for the .A, .B, ._ numeric missing values as well.

For example:

DATA MISSING;
INPUT CHAR $ X Y;
IF MISSING(CHAR) THEN N_CHAR + 1;
IF MISSING(X) THEN N_X + 1;
IF MISSING(Y) THEN N_Y + 1;
DATALINES;
CODY 5 6
. . .
WHATLEY .A ._
LAST 10 20
;


PROC PRINT DATA=MISSING NOOBS;
TITLE "Listing of MISSING";
RUN;
A listing of MISSING, below, shows that the MISSING function works correctly with character and numeric values, including all the alternative numeric missing values:

Listing of MISSING


source: www.support.sas.com

HOW TO USE THE SCAN FUNCTION:

USING THE SCAN FUNCTION:

SCAN(string,n,delimiters): returns the nth word from the character string string, where words are delimited by the characters in delimiters. 

It is used to extract words from a  character value when the relative order of words is known, but their starting positions are not.

NewVar=SCAN(string,n<,delimiters>); -returns the nth ‘word’ in the string
 
When the SCAN function is used:
􀂃 the length of the created variable is 200 bytes if it is not previously defined with a LENGTH statement
􀂃 delimiters before the first word have no effect When the SCAN function is used,
􀂃 any character or set of characters can serve as delimiters

Points to remember while using SCAN Function:
􀂃 a missing value is returned if there are fewer than n words in string
􀂃 two or more contiguous delimiters are treated as a single delimiter
ô€‚ƒ if  n is negative, SCAN selects the word in the character string starting from the end of string.

􀂃 If you omit delimiters , default is blank . < ( + & ! $ * ) ; ^ - / , %
Source: http://www.biostat.jhsph.edu/bstcourse/bio632/SummerInst/Class2/class2.pdf

Example of use:

Name= 'StudySAS, Blog';
Last= scan(Name,2,',');  *Note : Comma ',' is Delimiter here;

Results in Last returns to Blog

Using the SCAN function from SAS Functions by Example:

Suppose you want to produce an alphabetical list by last name, but your NAME variable contains FIRST, possibly a middle initial, and LAST name. The SCAN function makes quick work of this. Note that the LAST_NAME variable in PROC REPORT has the attribute of ORDER and NOPRINT, so that the list is in alphabetical order of last name but all that shows up is the original NAME variable in First, Middle, and Last name order.

DATA FIRST_LAST;
INPUT @1 NAME $20.@21 PHONE $13.;

***Extract the last name from NAME;


LAST_NAME = SCAN(NAME,-1,' '); /* Scans from the right */
DATALINES;
Jeff W. Snoker (908)782-4382
Raymond Albert (732)235-4444
Steven J. Foster (201)567-9876
Jose Romerez (516)593-2377
;

PROC REPORT DATA=FIRST_LAST NOWD;
TITLE "Names and Phone Numbers in Alphabetical Order (by Last Name)";
COLUMNS NAME PHONE LAST_NAME;
DEFINE LAST_NAME / ORDER NOPRINT WIDTH=20;
DEFINE NAME / DISPLAY 'Name' LEFT WIDTH=20;
DEFINE PHONE / DISPLAY 'Phone Number' WIDTH=13 FORMAT=$13.;
RUN;

source: www.support.sas.com

How to use the PROPCASE function

Using the PROPCASE function

The "old" way to capitalize the first letter of words was to use LOWCASE, UPCASE, and the SUBSTR function, like this:

DATA CAPITALIZE;
INFORMAT FIRST LAST $30.;
INPUT FIRST LAST;
FIRST = LOWCASE(FIRST);
LAST = LOWCASE(LAST);


SUBSTR(FIRST,1,1) = UPCASE(SUBSTR(FIRST,1,1));
SUBSTR(LAST,1,1) = UPCASE(SUBSTR(LAST,1,1));

DATALINES;
ronald cODy
THomaS eDISON
albert einstein
;
PROC PRINT DATA=CAPITALIZE NOOBS;
TITLE "Listing of Data Set CAPITALIZE";
RUN;

With the PROPCASE function in SAS 9.1, it's much easier.

DATA PROPER;
INPUT NAME $60.;
NAME = PROPCASE(NAME);
DATALINES;
ronald cODy
THomaS eDISON
albert einstein

;
PROC PRINT DATA=PROPER NOOBS;
TITLE "Listing of Data Set PROPER";
RUN;

source:www.support.sas.com


Example:

data test;
x=lowcase('MY NaMe iS SARaTh');
y=propcase(x);
z=propcase(lowcase('I AM A PROGRAMMER'));
a=lowcase(z);
b=upcase(x);
run;

How to capitalize the first letter of every word in a string



Capitalize the first letter of every word in a string

Convert a text string into mixed case.
Note: Beginning in SAS 9.1, this task is easily accomplished with the PROPCASE function. See Sample 2 on the Full Code tab.

/* Sample 1: COMPBL, LOWCASE, SCAN, INDEX, UPCASE, SUBSTR */

data sample;
input name $char50.;
/* Lowercase the entire string, remove consecutive blanks */
newname=compbl(lowcase(name));
length next $ 20;
i=0;
next=scan(newname,1,' ');
do while(next ne ' ');
i+1;

/* Scan off each 'word' based upon a space, locate the position */
/* of the first letter in the original string, UPCASE the first */
/* letter and use SUBSTR to replace the byte. */


pos=indexw(newname,trim(next));
substr(newname,pos,1)=upcase(substr(newname,pos,1));
next=scan(newname,i,' ');
end;
keep name newname;
datalines;
Jane DOE
min ning chou
HENRIK HANSSON
D ETCHEVERRY, Charo B
;

proc print;
run;


/* Sample 2: PROPCASE (available in SAS 9.1) */
data sample;
input name $char50.;
mixed=propcase(name," ',");
datalines;
tom o'shea
min ning chou
LENA HANSSON
murder, she wrote
;
proc print;
run;


OUTPUT:


source:support.sas.com

SOUNDEX function

Combine data sets based upon similar values

Encode character strings using SOUNDEX to aid in combining the data based upon similar but not exact values.

Note: The SOUNDEX algorithm is English-biased. For more details about SOUNDEX, please refer to the SAS Language Reference, Dictionary under Functions.













RESULT:


source: support.sas.com

Options in SAS' INFILE Statement

Options in SAS' INFILE Statement

There are a number of options available for the INFILE statement. Below you will find discussion of the following options: DLM='character', DSD, MISSOVER, and FIRSTOBS=value.

DLM='character'

When I prepare a data file for list input to SAS, I use a blank space as the delimiter. The delimiter is the character which must appear between the score for one variable and that for the next variable. One can, however, choose to use a delimiter other than a blank space. For example, the comma is a commonly used delimiter. If you are going to use a delimiter other than a blank space, you must tell SAS what the delimiter is.

Here is an example of a couple of data lines in a comma delimited file:

4,2,8010,2,4,2,4,4,2,2,2,2,2,2,4,4,2,4,2,2,CDFR,22,900,5,4,1
4,2,8011,1,2,3,1,3,4,4,4,1,2,2,4,2,3,4,3,1,psychology,24,360,4,3,1

Here is the INFILE statement which identified the delimiter as being a comma:
infile 'd:\Research-Misc\Hale\Hale.csv' dlm=',' dsd;

DSD

DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:

Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.

MISSOVER
I was reading a data file (from mainland China) which was very messy to read with list input, as it was not only comma delimited, but some subjects were missing data at the end of the data records, without commas marking the fields with missing data. I had to use the MISSOVER option, which prevents SAS from going to a new input line when it does not find values in the current line for some of the variables declared in the input statement. With the MISSOVER option, when SAS reaches the end of the current record, variables without any values assigned are set to missing.

Here is the INFILE statement I used:
data china; infile 'D:\Chia\beij93 data *' missover dlm=',';

FIRSTOBS=value
Sometimes the data file will have nondata on the first n lines of the file. You can use the FIRSTOBS command to tell SAS where to start reading the file. For example, consider the following infile statement:
infile 'D:\StatData\dlm.txt' dlm=',' dsd firstobs=7;

The data started on the seventh line of the file, so FIRSTOBS=7 was used to skip over the first six lines. The delimiter in the original file was a tab character, but I was unable to figure out how to use INFILE to set the delimiter to the tab character, so I used Word to replace every tab with a comma.

source: www.core.ecu.edu

Finding the number of observations in SAS dataset

There are a number of ways of finding out the number of observations in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter a pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname="WORK" and memname="TEST";
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx("nobs",nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option and you don’t need a SAS/AF licence to do it either.

%LET DSID=%SYSFUNC(OPEN(WORK.TEST,IN));
%LET NOBS=%SYSFUNC(ATTRN(&DSID,NOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

The first line opens the data set and the last one closes it; this is needed because you are not using data step or SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

source: technologytales.com

Another Simple way of doing this is to take help from PROC SQL automatic macro variable SQLOBS. Proc SQL automatically creates SALOBS macro variable, when it runs a program. SQLOBS macro variable will have the number of observations count of last proc SQL statement.

Here is how...

The following code will find the number of observations in the dataset AE in the Library name called SAS.

Note: Don't use noprint option in the Proc SQL statement.

libname sas 'C:\Users\Sarath Annapareddy\Desktop\*******;

proc sql;
select * from sas.ae;
quit;
%put SQLOBS=&sqlobs;



9 %put SQLOBS=&sqlobs;
SQLOBS=224



You can also get the number of observations value using Proc Contents. Here is how…

proc contents data=work.dataset out=nobs(where=(varnum=1)keep=memname nobs varnum)noprint;
run;

Convert values from character to numeric or from numeric to character.

Convert values from character to numeric or from numeric to character\Convert variable values using either the INPUT or PUT function.
Convert a character value to a numeric value by using the INPUT function. Specify a numeric informat that best describes how to Read the data value into the numeric variable. When changing types a new variable name is required. If you need to keep the original variable name for the new type, use the RENAME= option as illustrated in Sample 2.


data char;
input string :$8. date :$6.;
numeric=input(string,8.);
sasdate=input(date,mmddyy6.);
format sasdate mmddyy10.;
datalines;
1234.56 031704
3920 123104;


proc print;
run;








data now_num;
input num date: mmddyy6.;
datalines;
123456 110204
1000 120504;
 

run;




data now_char;
set now_num (rename=(num=oldnum date=olddate));
num=put(oldnum,6. -L);
date=put(olddate,date9.);
run;

proc print;
run;



Source: support.sas.com


Here is how to convert Numeric date with  yymmdd8. format to mmddyy10. format.

data datec;

input numdate;
date =input(put(numdate,z8.),yymmdd8.);
format date mmddyy10.;
datalines;
20040625
20081219
;
run;

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.