Wednesday, July 29, 2009

Dictionary Tables and SASHELP Views:

Most of the SAS programmers think that the Metadata concepts (Dictionary tables and SASHELP views) are advanced SAS topics, in fact it is not.

I tried here to explain the concept precise and clear……

Here, I will cover the following topics……

Introduction to SAS metadata
How to see what’s in the tables
How to use the tables, using several real-world examples
Advantages

What are Dictionary Tables4?

Dictionary tables are created and automatically maintained by SAS system at run time to store information related to SAS data libraries, SAS system options, SAS catalogs, and external files associated with the currently running SAS session. CATALOGS, COLUMNS, EXTFILES, INDEXES, MEMBERS, MACRO, OPTIONS, TABLES, TITLES, AND VIEW are these objects which provide detailed information about the SAS files.

DICTIONARY tables are special read-only PROC SQL tables. They retrieve information about all the SAS data libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session.

PROC SQL automatically assigns the DICTIONARY libref. To get information from DICTIONARY tables, specify DICTIONARY.table-name in the FROM clause.

DICTIONARY.table-name is valid in PROC SQL only. However, SAS provides PROC SQL views, based on the DICTIONARY tables that can be used in other SAS procedures and in the DATA step. These views are stored in the SASHELP library and are commonly called "SASHELP views."

The following table lists some of the DICTIONARY tables and the names of their corresponding views3. Ref..... SAS help and Documentation for complete details:


Where to find/ how do we access Dictionary Tables?

Start a session. Open an interactive SAS session and use PRINT to display SASHELP.-VTABLE.

1. Start a SAS session
2. Using SAS explorer, navigate to LIBRARIES SASHELP VTABLE
3. Double-click to open it. Examine contents briefly
4. Close the VIEWTABLE window (don’t terminate the SAS session)

Note: Available only in Version 6.07 or later SAS versions

Dictionary Tables
􀂄 Always available
􀂄 Maintained by SAS
􀂄 Read only
􀂄 Have information on virtually every aspect of your SAS session

Two ways to access them
1. Dictionary tables
2. SASHELP views

What Tables are Available:
􀂄 V8 11 DICTIONARY tables • 17 SASHELP views

􀂄 V9.1.3• 22 DICTIONARY tables• 30 SASHELP views

COLUMNS/VCOLUMN:

Provides information about all variables in all data sets in all specified libraries. It gives us a lot of flexibility to investigate data sets.

Example 1:

Reading all variable names of one data set into one macro variable i.e &VAR

proc sql noprint;
Select name into: VAR separated by ","
from columns where libname="WORK” & memname="DEMO";
quit;


All variable names in data set DEMO are read into string &VAR &VAR = usubjid, subjid, age, gender, race, ..... , ...

TABLES/VTABLE:

Being Parallel with MEMBERS, TABLES provides more detailed information about SAS data files.

The following code is one example of outputting a summary table of SAS data files under certain specified directory.


TITLES/VTITLE:
TITLES: Contains messages about titles and footnotes which are used by current program.
TITLES: Provides a useful source of managing titles and footnotes in reports.

The TITLES table has an entry for each title and footnote line currently in effect. See the example below on how to use this table to save the current titles, and then restore them after running a report.

create table DICTIONARY.TITLES
(type char(1) label='Title Location',
number num label='Title Number',
text char(256) label='Title Text');



EXTFILES/VEXTFL:
The EXTFILES table has an entry for each external file registered (filerefs) in the session.

create table DICTIONARY.EXTFILES
(fileref char(8) label='Fileref',
xpath char(1024) label='Path Name',
xengine char(8) label='Engine Name');


To check What Tables are Available:

proc sql;
create table tables as
select * from dictionary.dictionaries;
quit;



What Views are Available:

Execute the following program to see what views are available:

proc sql;
create table views as
select distinct memname
from sashelp.vsview
where libname='SASHELP'and memname like 'V%';
quit;


Here are the SASHELP views which are available in SAS version 9.1.3.

MEMNAME
VALLOPT
VCATALG
VCFORMAT
VCHKCON
VCNCOLU
VCNTABU
VCOLUMN
VDCTNRY
VENGINE
VEXTFL
VFORMAT
VGOPT
VINDEX
VLIBNAM
VMACRO
VMEMBER
VOPTION
VREFCON
VREMEMB
VSACCES
VSCATLG
VSLIB
VSTABLE
VSTABVW
VSTYLE
VSVIEW
VTABCON
VTITLE
VVIEW


What is the Table Structure?

proc sql;
describe table DICTIONARY.OPTIONS;
quit;


NOTE: SQL table DICTIONARY.OPTIONS was created like:

create table DICTIONARY.OPTIONS
(optname char(32) label='Option Name',
opttype char(8) label='Option type',
setting char(1024) label='Option Setting',
optdesc char(160) label='Option Description',
level char(8) label='Option Location',
group char(32) label='Option Group');


What can we do with these dictionary tables and or SASHELP views?

Check Conflicting Variable Attributes:
(Comparing the test and production directories)

Identify Current Directory:
(Retrieving the executing filename and path of program programmatically)

Use Variables Meeting Criteria or and Identify Features Incompatible with Transport Format:

/*Example #1: Check if all variables in the analysis data sets did have names and*/
/*labels, and the lengths of these should not exceed 8 and 40 characters, respectively.*/

/*Example #2: Check if all analysis data sets did have a label, and the length of any label cannot exceed 40 characters. Also, the length of data set names should not*/

/*Example #3: Check if the length of any character value of variable exceeds 200 characters*/
Refer:

http://www.studysas.blogspot.com/2009/04/proc-comparesas-program-used-to-compare.html
http://www.studysas.blogspot.com/2009/04/maxvarlenmacro-check-length-of-all.html


/*Example #4: Check if variables with the same name across multiple data sets have the*/
/*same attributes, in order to ensure consistency*/

*Select variable names that are in multiple data sets;
ODS RTF FILE="columns.rtf";
TITLE "List of Columns in Multiple Data Sets";
PROC SQL;
select name length=10, memname length=28,
label length=50, type length=8,
length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN" group by name
having count(*) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;

/*If we need to see variables with differences only, use the following:*/

OPTIONS orientation=Landscape;
ODS RTF FILE="columnsDiff.rtf";
TITLE "List of Columns in Multiple Data Sets with Discrepancy";
PROC SQL;
select name length=10, memname length=28, label length=50,
type length=8, length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN"
group by name
having count(distinct label) > 1 or
count(distinct type) > 1 or
count(distinct length) > 1 or
count(distinct format) >1 or
count(distinct informat) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;



Put all the variables in Alphabetical Order:
/*Example #5: Put all the variables into Alphabetical order;*/

proc sql noprint;
select name into: new
separated by ','
from dictionary.columns where libname='WORK' and memname='MH'
order by name;
create table mh1 as select &new.
from mh;
run;


Note: Memname is the name of the dataset(MH). The above program reorder the variables alphabetically. We are taking help from dictionary.columns using proc sql to know and create a macro variable new to each variable in the dataset. Upon creating the macro variable, use the 2nd proc sql step…. to reorder the variables.

Count Observations/Variables in a Data Set:

Renaming all variables in a SAS dataset:

http://studysas.blogspot.com/2009/07/renaming-all-variables-in-sas-data-set.html


Resetting Titles:

data test;
a=1;
b=1;
c=123;
run;
ods listing close;
ods trace on;
ods escapechar='*';

ods rtf file="Title Check.rtf" bodytitle style=rtfout;

title1 j=r 'page 1 of 1';
title2 'Title check';
footnote1 program.sas;
footnote2 July 22nd 2009;

proc print data=test;
run;

ods rtf close;
ods trace off;
ods listing;

DATA __oldtitles;
set sashelp.vtitle;
RUN;

Proc Contents vs Dictionary Tables/SASHELP views2:

Getting the information by extracting directly from SASHELP /DICTIONARY TABLES can be very helpful. These allow us to have more power with controlling the output than with proc contents. The standard output of PROC CONTENTS provides information from one dataset or all the datasets in a single library. Some of the information provided by PROC CONTENTS, like location of dataset, position of the columns in a dataset, may not be necessary for the final user of the information, and it is not easy to eliminate. With the use of PROC SQL, selected information about one or more dataset can be retrieved from views and create datasets.

Advantages1:
Multiple libraries can be analyzed in a single SQL query. Each PROC CONTENTS step is limited to a single library reference. In addition, a second step is not required to reduce variable-level information to the table level.

This approach may be more efficient for libraries containing many variables. PROC CONTENTS may take much longer because of the variable-level nature of its output data set.

All SAS files - not just SAS data sets - can be analyzed through the DICTIONARY.MEMBERS table by selecting the appropriate values of the Memtype column. PROC CONTENTS only processes SAS data sets when creating an output data set.

Disadvantages:
Knowledge of SQL syntax is required. We can access SASHELP views using a simple datastep but it is much slower in generating the required information.

References:
1) Don't Be a SAS® Dinosaur: Modernize Your SAS Programs by Warren Repole : SAS Library Information: SAS Files

2) A data dictionary, a more complete alternative to Proc contents: Pharmasug 2002 proceedings (By Angelina Cecilia Casas, M.S. PPD Development, Research Triangle Park, NC):

3) SAS Help and Documentation

4) Where, When, and How to Use Dictionary Tables in SAS: NESUG 1999 Proceedings (By Jiang Jin, EDP Contract Services)

5) Simple Ways to Use PROC SQL and SAS DICTIONARY TABLES to Verify Data Structure of the Electronic Submission Data Sets: Pharmasug 2006 tutorials (By Christine Teng and Wenjie Wang, Merck Research Labs, Merck & Co., Inc., Rahway, NJ)

Wednesday, July 22, 2009

Even you can Use HASH and DOUBLE DASH: It’s that Simple……

In order to understand HASH and DOUBLE HASH concept in SAS you need to know about two different ranges of variables:

1) Numbered list:

When a set of variables have the same prefix, and the rest of the name is a consecutive set of numbers, we can use a single dash (-) to refer to an entire range. Some exs.. are….

VAR1 VAR2 VAR3 VAR4 VAR5

Shortcut list you can use to access all 5 variables is VAR1-VAR5

COL1 COL2 COL3 COL4 COL5 COL6 COL7

Shortcut list you can use to access all 7 variables is COL1-COL7


2) Name range list:
When you refer to a list of variables in the order in which they were defined in the SAS dataset, you can use a double dash (--) to refer to the entire range of variables in between them.

Ex: VAR1 VAR2 VAR3 COUNT VAR4 COL1 VAR5

Shortcut list you can use to access all 7 variables (including COL1, which has different prefix name than others) is VAR1- -VAR5

The general rule you should always remember for dash and double dash is:

Single dash is useful to access the consequently numbered variables.

Double dash is useful to access variables based on their order/position regardless of variable name.

Execute following program in SAS to understand the concept better.


data test;
var1=1;
var2=2;
var3=3;
COL=1;
var4=4;
var5=5;
run;

ods listing close;
ods rtf file="Single DASH output.rtf" style=rtfout;
title1 'Using Single DASH';

proc print data=test;
var var1-var5;
run;

ods rtf close;
ods trace off;
ods listing;


ods listing close;
ods rtf file="Double DASH output.rtf" style=rtfout;
title1 'Using DOUBLE DASH';

proc print data=test;
var var1--var5;
run;

ods rtf close;
ods trace off;
ods listing;


SINGLE DASH OUTPUT:

Obs var1 var2 var3 var4 var5


1 1 2 3 4 5

DOUBLE DASH OUTPUT:

Obs var1 var2 var3 COL var4 var5
1 1 2 3 1 4 5

Renaming All Variables in a SAS Data Set Using the SASHELP VIEWS

*Create a temporary dataset... DSN;
data dsn;
a=1;
b=2;
c=3;
d=4;
e=5;
f=6;
run;


%macro test(lib,dsn);

*/1)*/ data _null_;
set sashelp.vtable(where=(libname="&LIB" and memname="&DSN"));
call symput('nvars',nvar);
run;

*/2)*/ data dsn;
set sashelp.vcolumn(where=(libname="&LIB" and memname="&DSN"));
call symput(cats("var",_n_),name);
run;


*/3)*/ proc datasets library=&LIB;
modify &DSN;
rename
%do i = 1 %to &nvars;
&&var&i=Rename_&&var&i.
%end;
;
quit;
run;
%mend;

%test(WORK,DSN);

After submitting the above program... the output looks like this....

Output:
Rename_a Rename_b Rename_c Rename_d Rename_e Rename_f
1 2 3 4 5 6

Here is a way I know of.. to rename all the variables in the dataset;

It can be done using the SASHELP views as follows:

1) The 1st step of the program determines the total number of variables inside the dataset with the help of SASHELP.Vtable, Data _null_ step used to do the same …. A macro variable NVAR (Number of Variables) will be created after this step which can be accessed in the following steps.

2) What the 2nd step does is….assigns a unique IDNUM to all the variables of the dataset with the help of SASHELP.Vcolumn.

3) What the 3rd step does is … it uses Proc Datasets with the MODIFY statement to rename all the variables in the dataset. The DO LOOP is been used to resolve the 6 macro variables in a macro of a temporary dataset.

Thursday, July 9, 2009

Subscript or Superscript in the footers/titles of RTF output

Wondering how to create a subscripts and superscipt in title or footnotes of the rtf output... here is a way to do it....

Execute the following program to get an Idea about how to keep SUBSCRIPTS and SUPERSCRIPTS in footnotes and titles of rtf output..

************************************************************;
data test;
length
secnam $15;
input
sortord secnam $ pvalue;
cards;
1 demog 0.8812
2 ae 0.7112
3 disposition 0.8112
4 medicalhistory 0.9112
;

run;

ods listing close;
ods rtf file="Test output.rtf" style=rtfout;
ods escapechar='\';

proc report data = test missing split="$" spacing=0 headline nowd;
column sortord secnam pvalue;
define sortord / order noprint;
define secnam / order flow "Demographics$Variable\{super a}";
define pvalue / display flow "ANOVA$P-Value\{sub p}";
run;


ods rtf close;
ods trace off;
ods listing;
*************************************************************;

Variable\{super a}";
*Adds a superscript a to Demographics variable;
ANOVA$P-Value\{sub p}";*Adds a subscript p to ANOVA P-Value variable;

Make sure to use the same kind of brackets { }I have used in the sample code to get the superscripts and subscripts to get printed in the rtf output.

ESCAPECHAR need not be '\' as it is mentioned in the sample code... It could be '*' or '^' or '#' anything we can think of...

Snapshot of output:


Learn how to view SAS dataset labels without opening the dataset directly in a SAS session. Easy methods and examples included!

Quick Tip: See SAS Dataset Labels Without Opening the Data Quick Tip: See SAS Dataset Labels With...