Wednesday, July 29, 2009

Dictionary Tables and SASHELP Views:

Most of the SAS programmers think that the Metadata concepts (Dictionary tables and SASHELP views) are advanced SAS topics, in fact it is not.

I tried here to explain the concept precise and clear……

Here, I will cover the following topics……

Introduction to SAS metadata
How to see what’s in the tables
How to use the tables, using several real-world examples
Advantages

What are Dictionary Tables4?

Dictionary tables are created and automatically maintained by SAS system at run time to store information related to SAS data libraries, SAS system options, SAS catalogs, and external files associated with the currently running SAS session. CATALOGS, COLUMNS, EXTFILES, INDEXES, MEMBERS, MACRO, OPTIONS, TABLES, TITLES, AND VIEW are these objects which provide detailed information about the SAS files.

DICTIONARY tables are special read-only PROC SQL tables. They retrieve information about all the SAS data libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session.

PROC SQL automatically assigns the DICTIONARY libref. To get information from DICTIONARY tables, specify DICTIONARY.table-name in the FROM clause.

DICTIONARY.table-name is valid in PROC SQL only. However, SAS provides PROC SQL views, based on the DICTIONARY tables that can be used in other SAS procedures and in the DATA step. These views are stored in the SASHELP library and are commonly called "SASHELP views."

The following table lists some of the DICTIONARY tables and the names of their corresponding views3. Ref..... SAS help and Documentation for complete details:


Where to find/ how do we access Dictionary Tables?

Start a session. Open an interactive SAS session and use PRINT to display SASHELP.-VTABLE.

1. Start a SAS session
2. Using SAS explorer, navigate to LIBRARIES SASHELP VTABLE
3. Double-click to open it. Examine contents briefly
4. Close the VIEWTABLE window (don’t terminate the SAS session)

Note: Available only in Version 6.07 or later SAS versions

Dictionary Tables
􀂄 Always available
􀂄 Maintained by SAS
􀂄 Read only
􀂄 Have information on virtually every aspect of your SAS session

Two ways to access them
1. Dictionary tables
2. SASHELP views

What Tables are Available:
􀂄 V8 11 DICTIONARY tables • 17 SASHELP views

􀂄 V9.1.3• 22 DICTIONARY tables• 30 SASHELP views

COLUMNS/VCOLUMN:

Provides information about all variables in all data sets in all specified libraries. It gives us a lot of flexibility to investigate data sets.

Example 1:

Reading all variable names of one data set into one macro variable i.e &VAR

proc sql noprint;
Select name into: VAR separated by ","
from columns where libname="WORK” & memname="DEMO";
quit;


All variable names in data set DEMO are read into string &VAR &VAR = usubjid, subjid, age, gender, race, ..... , ...

TABLES/VTABLE:

Being Parallel with MEMBERS, TABLES provides more detailed information about SAS data files.

The following code is one example of outputting a summary table of SAS data files under certain specified directory.


TITLES/VTITLE:
TITLES: Contains messages about titles and footnotes which are used by current program.
TITLES: Provides a useful source of managing titles and footnotes in reports.

The TITLES table has an entry for each title and footnote line currently in effect. See the example below on how to use this table to save the current titles, and then restore them after running a report.

create table DICTIONARY.TITLES
(type char(1) label='Title Location',
number num label='Title Number',
text char(256) label='Title Text');



EXTFILES/VEXTFL:
The EXTFILES table has an entry for each external file registered (filerefs) in the session.

create table DICTIONARY.EXTFILES
(fileref char(8) label='Fileref',
xpath char(1024) label='Path Name',
xengine char(8) label='Engine Name');


To check What Tables are Available:

proc sql;
create table tables as
select * from dictionary.dictionaries;
quit;



What Views are Available:

Execute the following program to see what views are available:

proc sql;
create table views as
select distinct memname
from sashelp.vsview
where libname='SASHELP'and memname like 'V%';
quit;


Here are the SASHELP views which are available in SAS version 9.1.3.

MEMNAME
VALLOPT
VCATALG
VCFORMAT
VCHKCON
VCNCOLU
VCNTABU
VCOLUMN
VDCTNRY
VENGINE
VEXTFL
VFORMAT
VGOPT
VINDEX
VLIBNAM
VMACRO
VMEMBER
VOPTION
VREFCON
VREMEMB
VSACCES
VSCATLG
VSLIB
VSTABLE
VSTABVW
VSTYLE
VSVIEW
VTABCON
VTITLE
VVIEW


What is the Table Structure?

proc sql;
describe table DICTIONARY.OPTIONS;
quit;


NOTE: SQL table DICTIONARY.OPTIONS was created like:

create table DICTIONARY.OPTIONS
(optname char(32) label='Option Name',
opttype char(8) label='Option type',
setting char(1024) label='Option Setting',
optdesc char(160) label='Option Description',
level char(8) label='Option Location',
group char(32) label='Option Group');


What can we do with these dictionary tables and or SASHELP views?

Check Conflicting Variable Attributes:
(Comparing the test and production directories)

Identify Current Directory:
(Retrieving the executing filename and path of program programmatically)

Use Variables Meeting Criteria or and Identify Features Incompatible with Transport Format:

/*Example #1: Check if all variables in the analysis data sets did have names and*/
/*labels, and the lengths of these should not exceed 8 and 40 characters, respectively.*/

/*Example #2: Check if all analysis data sets did have a label, and the length of any label cannot exceed 40 characters. Also, the length of data set names should not*/

/*Example #3: Check if the length of any character value of variable exceeds 200 characters*/
Refer:

http://www.studysas.blogspot.com/2009/04/proc-comparesas-program-used-to-compare.html
http://www.studysas.blogspot.com/2009/04/maxvarlenmacro-check-length-of-all.html


/*Example #4: Check if variables with the same name across multiple data sets have the*/
/*same attributes, in order to ensure consistency*/

*Select variable names that are in multiple data sets;
ODS RTF FILE="columns.rtf";
TITLE "List of Columns in Multiple Data Sets";
PROC SQL;
select name length=10, memname length=28,
label length=50, type length=8,
length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN" group by name
having count(*) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;

/*If we need to see variables with differences only, use the following:*/

OPTIONS orientation=Landscape;
ODS RTF FILE="columnsDiff.rtf";
TITLE "List of Columns in Multiple Data Sets with Discrepancy";
PROC SQL;
select name length=10, memname length=28, label length=50,
type length=8, length length=8, format length=8,
informat length=8
from dictionary.columns
where libname="PRODN"
group by name
having count(distinct label) > 1 or
count(distinct type) > 1 or
count(distinct length) > 1 or
count(distinct format) >1 or
count(distinct informat) > 1
order by name, memname;
QUIT;
ODS RTF CLOSE;



Put all the variables in Alphabetical Order:
/*Example #5: Put all the variables into Alphabetical order;*/

proc sql noprint;
select name into: new
separated by ','
from dictionary.columns where libname='WORK' and memname='MH'
order by name;
create table mh1 as select &new.
from mh;
run;


Note: Memname is the name of the dataset(MH). The above program reorder the variables alphabetically. We are taking help from dictionary.columns using proc sql to know and create a macro variable new to each variable in the dataset. Upon creating the macro variable, use the 2nd proc sql step…. to reorder the variables.

Count Observations/Variables in a Data Set:

Renaming all variables in a SAS dataset:

http://studysas.blogspot.com/2009/07/renaming-all-variables-in-sas-data-set.html


Resetting Titles:

data test;
a=1;
b=1;
c=123;
run;
ods listing close;
ods trace on;
ods escapechar='*';

ods rtf file="Title Check.rtf" bodytitle style=rtfout;

title1 j=r 'page 1 of 1';
title2 'Title check';
footnote1 program.sas;
footnote2 July 22nd 2009;

proc print data=test;
run;

ods rtf close;
ods trace off;
ods listing;

DATA __oldtitles;
set sashelp.vtitle;
RUN;

Proc Contents vs Dictionary Tables/SASHELP views2:

Getting the information by extracting directly from SASHELP /DICTIONARY TABLES can be very helpful. These allow us to have more power with controlling the output than with proc contents. The standard output of PROC CONTENTS provides information from one dataset or all the datasets in a single library. Some of the information provided by PROC CONTENTS, like location of dataset, position of the columns in a dataset, may not be necessary for the final user of the information, and it is not easy to eliminate. With the use of PROC SQL, selected information about one or more dataset can be retrieved from views and create datasets.

Advantages1:
Multiple libraries can be analyzed in a single SQL query. Each PROC CONTENTS step is limited to a single library reference. In addition, a second step is not required to reduce variable-level information to the table level.

This approach may be more efficient for libraries containing many variables. PROC CONTENTS may take much longer because of the variable-level nature of its output data set.

All SAS files - not just SAS data sets - can be analyzed through the DICTIONARY.MEMBERS table by selecting the appropriate values of the Memtype column. PROC CONTENTS only processes SAS data sets when creating an output data set.

Disadvantages:
Knowledge of SQL syntax is required. We can access SASHELP views using a simple datastep but it is much slower in generating the required information.

References:
1) Don't Be a SAS® Dinosaur: Modernize Your SAS Programs by Warren Repole : SAS Library Information: SAS Files

2) A data dictionary, a more complete alternative to Proc contents: Pharmasug 2002 proceedings (By Angelina Cecilia Casas, M.S. PPD Development, Research Triangle Park, NC):

3) SAS Help and Documentation

4) Where, When, and How to Use Dictionary Tables in SAS: NESUG 1999 Proceedings (By Jiang Jin, EDP Contract Services)

5) Simple Ways to Use PROC SQL and SAS DICTIONARY TABLES to Verify Data Structure of the Electronic Submission Data Sets: Pharmasug 2006 tutorials (By Christine Teng and Wenjie Wang, Merck Research Labs, Merck & Co., Inc., Rahway, NJ)

0 comments:

Post a Comment

ShareThis