Wednesday, July 29, 2009

Dictionary Tables and SASHELP Views:

Most of the SAS programmers think that the Metadata concepts (Dictionary tables and SASHELP views) are advanced SAS topics, in fact it is not.

I tried here to explain the concept precise and clear……

Here, I will cover the following topics……

Introduction to SAS metadata
How to see what’s in the tables
How to use the tables, using several real-world examples
Advantages

What are Dictionary Tables4?

Dictionary tables are created and automatically maintained by SAS system at run time to store information related to SAS data libraries, SAS system options, SAS catalogs, and external files associated with the currently running SAS session. CATALOGS, COLUMNS, EXTFILES, INDEXES, MEMBERS, MACRO, OPTIONS, TABLES, TITLES, AND VIEW are these objects which provide detailed information about the SAS files.

DICTIONARY tables are special read-only PROC SQL tables. They retrieve information about all the SAS data libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session.

PROC SQL automatically assigns the DICTIONARY libref. To get information from DICTIONARY tables, specify DICTIONARY.table-name in the FROM clause.

DICTIONARY.table-name is valid in PROC SQL only. However, SAS provides PROC SQL views, based on the DICTIONARY tables that can be used in other SAS procedures and in the DATA step. These views are stored in the SASHELP library and are commonly called "SASHELP views."

The following table lists some of the DICTIONARY tables and the names of their corresponding views3. Ref..... SAS help and Documentation for complete details:


Where to find/ how do we access Dictionary Tables?

Start a session. Open an interactive SAS session and use PRINT to display SASHELP.-VTABLE.

1. Start a SAS session
2. Using SAS explorer, navigate to LIBRARIES SASHELP VTABLE
3. Double-click to open it. Examine contents briefly
4. Close the VIEWTABLE window (don’t terminate the SAS session)

Note: Available only in Version 6.07 or later SAS versions

Dictionary Tables
􀂄 Always available
􀂄 Maintained by SAS
􀂄 Read only
􀂄 Have information on virtually every aspect of your SAS session

Two ways to access them
1. Dictionary tables
2. SASHELP views

What Tables are Available:
􀂄 V8 11 DICTIONARY tables • 17 SASHELP views

􀂄 V9.1.3• 22 DICTIONARY tables• 30 SASHELP views

COLUMNS/VCOLUMN:

Provides information about all variables in all data sets in all specified libraries. It gives us a lot of flexibility to investigate data sets.

Example 1:

Reading all variable names of one data set into one macro variable i.e &VAR

proc sql noprint;
Select name into: VAR separated by ","
from columns where libname="WORK” & memname="DEMO";
quit;


All variable names in data set DEMO are read into string &VAR &VAR = usubjid, subjid, age, gender, race, ..... , ...

TABLES/VTABLE:

Being Parallel with MEMBERS, TABLES provides more detailed information about SAS data files.

The following code is one example of outputting a summary table of SAS data files under certain specified directory.


TITLES/VTITLE:
TITLES: Contains messages about titles and footnotes which are used by current program.
TITLES: Provides a useful source of managing titles and footnotes in reports.

The TITLES table has an entry for each title and footnote line currently in effect. See the example below on how to use this table to save the current titles, and then restore them after running a report.

create table DICTIONARY.TITLES
(type char(1) label='Title Location',
number num label='Title Number',
text char(256) label='Title Text');



EXTFILES/VEXTFL:
The EXTFILES table has an entry for each external file registered (filerefs) in the session.

create table DICTIONARY.EXTFILES
(fileref char(8) label='Fileref',
xpath char(1024) label='Path Name',
xengine char(8) label='Engine Name');


To check What Tables are Available:

proc sql;
create table tables as
select * from dictionary.dictionaries;
quit;



What Views are Available:

Execute the following program to see what views are available:

proc sql;
create table views as
select distinct memname
from sashelp.vsview
where libname='SASHELP'and memname like 'V%';
quit;


Here are the SASHELP views which are available in SAS version 9.1.3.

MEMNAME
VALLOPT
VCATALG
VCFORMAT
VCHKCON
VCNCOLU
VCNTABU
VCOLUMN
VDCTNRY
VENGINE
VEXTFL
VFORMAT
VGOPT
VINDEX
VLIBNAM
VMACRO
VMEMBER
VOPTION
VREFCON
VREMEMB
VSACCES
VSCATLG
VSLIB
VSTABLE
VSTABVW
VSTYLE
VSVIEW
VTABCON
VTITLE
VVIEW


What is the Table Structure?

proc sql;
describe table DICTIONARY.OPTIONS;
quit;


NOTE: SQL table DICTIONARY.OPTIONS was created like:

create table DICTIONARY.OPTIONS
(optname char(32) label='Option Name',
opttype char(8) label='Option type',
setting char(1024) label='Option Setting',
optdesc char(160) label='Option Description',
level char(8) label='Option Location',
group char(32) label='Option Group');


What can we do with these dictionary tables and or SASHELP views?

Check Conflicting Variable Attributes:
(Comparing the test and production directories)

Identify Current Directory:
(Retrieving the executing filename and path of program programmatically)

Use Variables Meeting Criteria or and Identify Features Incompatible with Transport Format:

/*Example #1: Check if all variables in the analysis data sets did have names and*/
/*labels, and the lengths of these should not exceed 8 and 40 characters, respectively.*/

/*Example #2: Check if all analysis data sets did have a label, and the length of any label cannot exceed 40 characters. Also, the length of data set names should not*/

/*Example #3: Check if the length of any character value of variable exceeds 200 characters*/
Refer:

http://www.studysas.blogspot.com/2009/04/proc-comparesas-program-used-to-compare.html
http://www.studysas.blogspot.com/2009/04/maxvarlenmacro-check-length-of-all.html


/*Example #4: Check if variables with the same name across multiple data sets have the*/
/*same attributes, in order to ensure consistency*/

*Select variable names that are in multiple data sets;
ODS RTF FILE="columns.rtf";
TITLE "List of Columns in Multiple Data Sets";
PROC SQL;
select name length=10, memname length=28,
label length=50, type length=8,
length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN" group by name
having count(*) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;

/*If we need to see variables with differences only, use the following:*/

OPTIONS orientation=Landscape;
ODS RTF FILE="columnsDiff.rtf";
TITLE "List of Columns in Multiple Data Sets with Discrepancy";
PROC SQL;
select name length=10, memname length=28, label length=50,
type length=8, length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN"
group by name
having count(distinct label) > 1 or
count(distinct type) > 1 or
count(distinct length) > 1 or
count(distinct format) >1 or
count(distinct informat) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;



Put all the variables in Alphabetical Order:
/*Example #5: Put all the variables into Alphabetical order;*/

proc sql noprint;
select name into: new
separated by ','
from dictionary.columns where libname='WORK' and memname='MH'
order by name;
create table mh1 as select &new.
from mh;
run;


Note: Memname is the name of the dataset(MH). The above program reorder the variables alphabetically. We are taking help from dictionary.columns using proc sql to know and create a macro variable new to each variable in the dataset. Upon creating the macro variable, use the 2nd proc sql step…. to reorder the variables.

Count Observations/Variables in a Data Set:

Renaming all variables in a SAS dataset:

http://studysas.blogspot.com/2009/07/renaming-all-variables-in-sas-data-set.html


Resetting Titles:

data test;
a=1;
b=1;
c=123;
run;
ods listing close;
ods trace on;
ods escapechar='*';

ods rtf file="Title Check.rtf" bodytitle style=rtfout;

title1 j=r 'page 1 of 1';
title2 'Title check';
footnote1 program.sas;
footnote2 July 22nd 2009;

proc print data=test;
run;

ods rtf close;
ods trace off;
ods listing;

DATA __oldtitles;
set sashelp.vtitle;
RUN;

Proc Contents vs Dictionary Tables/SASHELP views2:

Getting the information by extracting directly from SASHELP /DICTIONARY TABLES can be very helpful. These allow us to have more power with controlling the output than with proc contents. The standard output of PROC CONTENTS provides information from one dataset or all the datasets in a single library. Some of the information provided by PROC CONTENTS, like location of dataset, position of the columns in a dataset, may not be necessary for the final user of the information, and it is not easy to eliminate. With the use of PROC SQL, selected information about one or more dataset can be retrieved from views and create datasets.

Advantages1:
Multiple libraries can be analyzed in a single SQL query. Each PROC CONTENTS step is limited to a single library reference. In addition, a second step is not required to reduce variable-level information to the table level.

This approach may be more efficient for libraries containing many variables. PROC CONTENTS may take much longer because of the variable-level nature of its output data set.

All SAS files - not just SAS data sets - can be analyzed through the DICTIONARY.MEMBERS table by selecting the appropriate values of the Memtype column. PROC CONTENTS only processes SAS data sets when creating an output data set.

Disadvantages:
Knowledge of SQL syntax is required. We can access SASHELP views using a simple datastep but it is much slower in generating the required information.

References:
1) Don't Be a SAS® Dinosaur: Modernize Your SAS Programs by Warren Repole : SAS Library Information: SAS Files

2) A data dictionary, a more complete alternative to Proc contents: Pharmasug 2002 proceedings (By Angelina Cecilia Casas, M.S. PPD Development, Research Triangle Park, NC):

3) SAS Help and Documentation

4) Where, When, and How to Use Dictionary Tables in SAS: NESUG 1999 Proceedings (By Jiang Jin, EDP Contract Services)

5) Simple Ways to Use PROC SQL and SAS DICTIONARY TABLES to Verify Data Structure of the Electronic Submission Data Sets: Pharmasug 2006 tutorials (By Christine Teng and Wenjie Wang, Merck Research Labs, Merck & Co., Inc., Rahway, NJ)

Wednesday, July 22, 2009

Even you can Use HASH and DOUBLE DASH: It’s that Simple……

In order to understand HASH and DOUBLE HASH concept in SAS you need to know about two different ranges of variables:

1) Numbered list:

When a set of variables have the same prefix, and the rest of the name is a consecutive set of numbers, we can use a single dash (-) to refer to an entire range. Some exs.. are….

VAR1 VAR2 VAR3 VAR4 VAR5

Shortcut list you can use to access all 5 variables is VAR1-VAR5

COL1 COL2 COL3 COL4 COL5 COL6 COL7

Shortcut list you can use to access all 7 variables is COL1-COL7


2) Name range list:
When you refer to a list of variables in the order in which they were defined in the SAS dataset, you can use a double dash (--) to refer to the entire range of variables in between them.

Ex: VAR1 VAR2 VAR3 COUNT VAR4 COL1 VAR5

Shortcut list you can use to access all 7 variables (including COL1, which has different prefix name than others) is VAR1- -VAR5

The general rule you should always remember for dash and double dash is:

Single dash is useful to access the consequently numbered variables.

Double dash is useful to access variables based on their order/position regardless of variable name.

Execute following program in SAS to understand the concept better.


data test;
var1=1;
var2=2;
var3=3;
COL=1;
var4=4;
var5=5;
run;

ods listing close;
ods rtf file="Single DASH output.rtf" style=rtfout;
title1 'Using Single DASH';

proc print data=test;
var var1-var5;
run;

ods rtf close;
ods trace off;
ods listing;


ods listing close;
ods rtf file="Double DASH output.rtf" style=rtfout;
title1 'Using DOUBLE DASH';

proc print data=test;
var var1--var5;
run;

ods rtf close;
ods trace off;
ods listing;


SINGLE DASH OUTPUT:

Obs var1 var2 var3 var4 var5


1 1 2 3 4 5

DOUBLE DASH OUTPUT:

Obs var1 var2 var3 COL var4 var5
1 1 2 3 1 4 5

Renaming All Variables in a SAS Data Set Using the SASHELP VIEWS

*Create a temporary dataset... DSN;
data dsn;
a=1;
b=2;
c=3;
d=4;
e=5;
f=6;
run;


%macro test(lib,dsn);

*/1)*/ data _null_;
set sashelp.vtable(where=(libname="&LIB" and memname="&DSN"));
call symput('nvars',nvar);
run;

*/2)*/ data dsn;
set sashelp.vcolumn(where=(libname="&LIB" and memname="&DSN"));
call symput(cats("var",_n_),name);
run;


*/3)*/ proc datasets library=&LIB;
modify &DSN;
rename
%do i = 1 %to &nvars;
&&var&i=Rename_&&var&i.
%end;
;
quit;
run;
%mend;

%test(WORK,DSN);

After submitting the above program... the output looks like this....

Output:
Rename_a Rename_b Rename_c Rename_d Rename_e Rename_f
1 2 3 4 5 6

Here is a way I know of.. to rename all the variables in the dataset;

It can be done using the SASHELP views as follows:

1) The 1st step of the program determines the total number of variables inside the dataset with the help of SASHELP.Vtable, Data _null_ step used to do the same …. A macro variable NVAR (Number of Variables) will be created after this step which can be accessed in the following steps.

2) What the 2nd step does is….assigns a unique IDNUM to all the variables of the dataset with the help of SASHELP.Vcolumn.

3) What the 3rd step does is … it uses Proc Datasets with the MODIFY statement to rename all the variables in the dataset. The DO LOOP is been used to resolve the 6 macro variables in a macro of a temporary dataset.

Thursday, July 9, 2009

Subscript or Superscript in the footers/titles of RTF output

Wondering how to create a subscripts and superscipt in title or footnotes of the rtf output... here is a way to do it....

Execute the following program to get an Idea about how to keep SUBSCRIPTS and SUPERSCRIPTS in footnotes and titles of rtf output..

************************************************************;
data test;
length
secnam $15;
input
sortord secnam $ pvalue;
cards;
1 demog 0.8812
2 ae 0.7112
3 disposition 0.8112
4 medicalhistory 0.9112
;

run;

ods listing close;
ods rtf file="Test output.rtf" style=rtfout;
ods escapechar='\';

proc report data = test missing split="$" spacing=0 headline nowd;
column sortord secnam pvalue;
define sortord / order noprint;
define secnam / order flow "Demographics$Variable\{super a}";
define pvalue / display flow "ANOVA$P-Value\{sub p}";
run;


ods rtf close;
ods trace off;
ods listing;
*************************************************************;

Variable\{super a}";
*Adds a superscript a to Demographics variable;
ANOVA$P-Value\{sub p}";*Adds a subscript p to ANOVA P-Value variable;

Make sure to use the same kind of brackets { }I have used in the sample code to get the superscripts and subscripts to get printed in the rtf output.

ESCAPECHAR need not be '\' as it is mentioned in the sample code... It could be '*' or '^' or '#' anything we can think of...

Snapshot of output:


Saturday, June 27, 2009

%EVAL AND %SYSEVALF MACRO FUNCTIONS: GETTING TO KNOW THEM BETTER.

With the exception of %sysevalf function, integer arithmetic is the only way macro statements perform arithmetic calculations.
Following are the few examples of macro statements performing integer arithmetic calculations:
%let one=%eval (3+5);
%let two=%eval (5*2);
%let three=%eval (9/3);
%let four=%eval (5/2);
%put The value of one is &one;
%put The value of two is &two;
%put The value of three is &three;
%put The value of four is & four;

Open the Log file and see the results as follows:
The value of one is 8
The value of two is 10
The value of three is 3
The value of four is 2
The value for macro variable four, should be 2.5, instead it shows only two. That happens because if we perform division on integers, integer arithmetic doesn’t take the fractional part into account.
When we try to execute the integer arithmetic calculations of values with functional part, :

%let last= %eval (5.0+3.0); /*INCORRECT*/

%EVAL function only supports integer arithmetic values. The values here in the above statement have a period character to numeric values and because of that the macro processor stops evaluating and produces the following error message: “ ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: 5.0+3.0 “

So our next question comes is… how to perform these type of calculations:
Here comes %sysevalf to rescue you, because this function is capable of evaluating the floating point operands.

Evaluating Floating Point Operands
The %SYSEVALF function can perform arithmetic calculations with operands that have the floating point values.
Here are some of the examples where %SYSEVALF function becomes handy:
%let test= %sysevalf(1.0*3.0);
%let final= %sysevalf(1.5+2.8);
%let last= %sysevalf(5/3);
%put The value of test is &test;
%put The value of final is &final;
%put The value of last is &last;
The %PUT statements display the following messages in the log:
The value of test is 3
The value of final is 4.3
The value of last is 1.66666666666666
%SYSEVALF function perform arithmetic calculations and the result of the evaluation can be a floating point value like in the final and last macro variable case, but as in integer arithmetic calculations, the result is always a text.
The %SYSEVALF function be used in conjugation with other functions like, INTEGER, CEIL, and FLOOR.
For example, the following %PUT statements return 3, 4 and 3 respectively:
%let val=3.8;
%put %sysevalf(&val,integer);
*Value returns in the log is 3;
%put %sysevalf(&val, ceil);
*Value returns in the log is 4;
%put %sysevalf(&val,floor);
*Value returns in the log is 3;

Difference between %eval and %sysevalf functions can be understand better with the following example;
%let value=9;
%let value2=5;
%let newval=%sysevalf(&value/&value2);
%let newval1=%eval(&value/&value2);
%put &newval;
%put &newval1;
*Ans: newval=1.8;
*Ans: newval1=1;

Saturday, June 20, 2009

DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files;

One of my favorite methods of exporting excel or .csv file is to use the ‘DEXPORT’ command-line command. This certainly reduces the amount of typing required to export the SAS dataset. Another interesting point is DEXPORT command works fine in UNIX and PC.

 
Syntax: dmDEXPORT libref.dsn 'filename.xls' replace;
 
"libref" is a library, "dsn" is the name of a SAS data set, and "filename.xls" is the name of the 
tab delimited text file(excel) being created. If we don’t specify the Libname or it is work then 
the dataset ‘dsn’ from the WORK directory is exported in a excel format to a specified location. 
Replace option … replaces the file if it already exists. 
 

Use DIMPORT command-line command to convert/import a tab delimited (excel or .csv etc) into a
SAS dataset.
Syntax: dm “DIMPORT ‘filename.csv’ exc" replace;



DIMPORT command tells SAS to import or convert the tab delimited file (filename.csv) to a SAS
dataset named ‘exc’; Replace option … replaces the dataset named ‘exc’ if it already exists by
any chance.

IFC and IFN functions: New IF functions:

Objective: To Reduce the amount of typing required achieving an objective

Syntax: IFN (condition, true, false, missing): ‘N’ stands for Numeric
IFN returns a numeric value. It returns the true, false or missing value depending on whether the condition is true, false or missing.

Syntax: IFC (condition, true, false, missing): ‘C’ stands for character
IFC function has four parameters:
1) a logical expression
2) character value returned when true
3) value returned when false
4) value retuned when missing, which is optional.

IFC (logical-expression, Character-value-returned-when-true, Character-value-returned-when-false, Character-value-returned-when-missing);
IFC returns a character value. It returns the true, false or missing value depending on whether the condition is true, false or missing.

Example: Assign a value to the VISIT variable (new) as per the VTYPE variable value.
We can certainly achieve this task in diff. ways.. here are they...

data old;
input sitesub $ vtype vdate $;
cards;
01-303 1.4 12/23/2005
01-304 1.5 09/03/2005
01-305 1.4 10/09/2005
01-306 1.5 11/17/2005
01-307 1.5 05/29/2005
01-308 . 04/30/2005
;
run;

1) * Using Proc Format:
proc format;
value vt
1.4='Baseline'
1.5='Retreat'
.='Missing'
;
run;
data new;
set old;
length visit $20;
visit=put(vtype,vt.);
run;

2) *Using the IF-THEN/ELSE statements;
data new;
set old;
length visit $20;
if vtype=1.4 then visit='Baseline';
else if vtype=1.5 then visit='Retreat';
else if vtype=. then visit='Missing';
run;

3) *Using the Proc SQL;
Proc sql;
Create table new4 as
Select *,
case
when vtype=1.4 then 'Baseline'
when vtype=1.5 then 'Retreat'
else 'Missing'
end as visit
from old;
Quit;

All three above methods required significant amount of typing when we compared with the below ones......
*Using the IFC function in Datastep;
data new;
set old;
length visit $20;
visit=ifc(vtype=1.4,'Baseline','Retreat','Missing');
run;


*Using the IFC function in Proc SQl;
Proc sql;
Create table new as
Select *,
Ifc(vtype=1.4,'Baseline','Retreat','Missing') as visit
from old;
Quit;

In a DATA step, if the IFC function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.

Note: IFN and IFN functions cannot be used if we want to assign more than 3 values (including missing) to a variable.

Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...