Thursday, September 4, 2008

SAS free study tutorials

Data step:
getting started 1: windows SAS code
getting started 2: data step SAS codeautomatic _N_ variable SAS code
drop & delete SAS codeformating: dates and numbers SAS code date sal.txt
(also see the format procedure below to create your own formats)
functions SAS codeimport: Bringing in data from Excel SAS code
Excel import file Excel export file text fileinput:
length statement SAS code infile options.txtlong SAS code long.txt
missing data SAS code
output option SAS code
pointers SAS code ex7.txt ex8.txt ex9.txtmore about pointers
SAS code pointers.SAS ex10.txt
missover & delimiter SAS code delimiter.txtmore on the delimiter SAS code
retain SAS codeset SAS code
simulations:
random numbers SAS code
sum SAS code
statistical functions SAS code

Logic:
do loops SAS codemore about do loops SAS codenested do loops SAS codeif then statements SAS code score.txt

Combining Data sets:
concatenating and interleaving SAS code
one-to-one merging SAS code
match merging SAS codeupdating SAS code

Character functions:
substring function SAS code
trim and left functions SAS code
compress and index functions SAS code record.txt
indexc and indexw functions SAS code
implicit character-to-numeric conversion SAS code
explicit character-to-numeric conversion SAS code
implicit and explicit numeric-to-character conversion SAS code

Arrays:
introduction to arrays SAS code
using arrays to count SAS code
using arrays to order observations SAS code
using arrays to transpose data SAS code ratsdose.txt
two dimensional arrays SAS code temp.txt fin.txt

Permanent SAS Data sets:
(great for large data sets)introduction: using libname SAS code
put and file statements SAS code survey.dat data1.dat data2.dat data3.txt fruit.dat data4.dat data5.dat income.dat

Procedures:
ANOVA SAS code incommed.datanalysis of equal vars: B-P for anova SAS code
contents: Great for large data sets SAS code
sheep.dat
correlation SAS code
import: Bringing in data from Excel SAS code Excel import file Excel export file text fileformat SAS code incommed.dat (also see formating above for SAS' date and number formats)
frequency tables SAS code incommed.dat freq.xlsmeans SAS code
incommed.dat
more about means SAS code incommed.datgcharts: Bar and Pie charts SAS code incommed.datgplot: a prettier plot SAS code
more about gplot SAS codeplot SAS codeprint SAS code account.txt
sort SAS code account.txt
more about sorting SAS code compt.txt t-test SAS code incommed.dat
transpose SAS code
univariate SAS code test.dat

Programming outside the Data step or Procedures:
Getting started 3: options SAS codemore options SAS code

Macros:
introduction:
macro variables (%let statement)
SAS code number.dat contest.dat%put statement SAS code score.dat
basic macros SAS code
macros with parameters SAS code ranks.dat
macro do loops SAS code
macro if/then/else statements SAS code makeup.dat
nested macros SAS codesimulations example SAS code reg.dat

Index to Statistics Tutorials (source:www.stattutorials.com)
PROC MEANS Tutorial (Descriptive statistics)
PROC UNIVARIATE Tutorial (Distribution analysis)
New: PROC UNIVARIATE - Advanced Tutorial
PROC CORR Tutorial (Correlation)
PROC FREQ Tutorial 1 (Frequency Tables/Goodness of Fit)
PROC FREQ Tutorial 2 (Two-way tables)
PROC TTEST Tutorial (Two sample and paired t-tests)
New: A comparison of Paired & Independent Sample t-tests
PROC ANOVA & GLM Tutorial (One-Way ANOVA)
PROC GLM Tutorial (Repeated measures ANOVA using PROC GLM)
New: Survival Analysis & comparison of groups using PROC LIFEREG
Bland-Altman Analysis (Comparing two measures)
Inter-Rater Reliability, Kappa, Weighted Kappa (PROC FREQ)
New: SAS Functions (2-part tutorial)

Special SAS Topics
New: Setting the SAS Initial Folder (default directory)
Using SAS ODS Output, Styles, Graphics, Data
Data files and SAS code for tutorials

General Statistical Tutorials
Interpreting p-values
Understanding hypothesis testing
Statistical comparison of two groups


Tuesday, September 2, 2008

SAS Interview Questions & Answers:Clinical trials

1.Describe the phases of clinical trials?

Ans:- These are the following four phases of the clinical trials:

Phase 1: Test a new drug or treatment to a small group of people (20-80) to evaluate its safety.

Phase 2: The experimental drug or treatment is given to a large group of people (100-300) to see that the drug is effective or not for that treatment.

Phase 3: The experimental drug or treatment is given to a large group of people (1000-3000) to see its effectiveness, monitor side effects and compare it to commonly used treatments.

Phase 4: The 4 phase study includes the post marketing studies including the drug's risk, benefits etc.



2. Describe the validation procedure? How would you perform the validation for TLG as well as analysis data set?

Ans:- Validation procedure is used to check the output of the SAS program generated by the source programmer. In this process validator write the program and generate the output. If this output is same as the output generated by the SAS programmer's output then the program is considered to be valid. We can perform this validation for TLG by checking the output manually and for analysis data set it can be done using PROC COMPARE.

3. How would you perform the validation for the listing, which has 400 pages?

Ans:- It is not possible to perform the validation for the listing having 400 pages manually. To do this, we convert the listing in data sets by using PROC REPORT and then after that we can compare it by using PROC COMPARE.

4. Can you use PROC COMPARE to validate listings? Why?

Ans:- Yes, we can use PROC COMPARE to validate the listing because if there are many entries (pages) in the listings then it is not possible to check them manually. So in this condition we use PROC COMPARE to validate the listings.

5. How would you generate tables, listings and graphs?

Ans:- We can generate the listings by using the PROC REPORT. Similarly we can create the tables by using PROC FREQ, PROC MEANS, and PROC TRANSPOSE and PROC REPORT. We would generate graph, using proc Gplot etc.

6. How many tables can you create in a day?

Ans:- Actually it depends on the complexity of the tables if there are same type of tables then, we can create 1-2-3 tables in a day.

7. What are all the PROCS have you used in your experience?

Ans:- I have used many procedures like proc report, proc sort, proc format etc. I have used proc report to generate the list report, in this procedure I have used subjid as order variable and trt_grp, sbd, dbd as display variables.

8. Describe the data sets you have come across in your life?

Ans:- I have worked with demographic, adverse event , laboratory, analysis and other data sets.

9. How would you submit the docs to FDA? Who will submit the docs?

Ans:- We can submit the docs to FDA by e-submission. Docs can be submitted to FDA using

Define.pdf or define.Xml formats. In this doc we have the documentation about macros and program and E-records also. Statistician or project manager will submit this doc to FDA.

10. What are the docs do you submit to FDA?

Ans:- We submit ISS and ISE documents to FDA.

11. Can u share your CDISC experience? What version of CDISC SDTM have you used?

Ans: I have used version 3.1.1 of the CDISC SDTM.

12. Tell me the importance of the SAP?

Ans:- This document contains detailed information regarding study objectives and statistical methods to aid in the production of the Clinical Study Report (CSR) including summary tables, figures, and subject data listings for Protocol. This document also contains documentation of the program variables and algorithms that will be used to generate summary statistics and statistical analysis.

13. Tell me about your project group? To whom you would report/contact?

My project group consisting of six members, a project manager, two statisticians, lead programmer and two programmers.

I usually report to the lead programmer. If I have any problem regarding the programming I would contact the lead programmer.

If I have any doubt in values of variables in raw dataset I would contact the statistician. For example the dataset related to the menopause symptoms in women, if the variable sex having the values like F, M. I would consider it as wrong; in that type of situations I would contact the statistician.


14. Explain SAS documentation.

SAS documentation includes programmer header, comments, titles, footnotes etc. Whatever we type in the program for making the program easily readable, easily understandable are in called as SAS documentation.

15. How would you know whether the program has been modified or not?

I would know the program has been modified or not by seeing the modification history in the program header.

16. Project status meeting?

It is a planetary meeting of all the project managers to discuss about the present Status of the project in hand and discuss new ideas and options in improving the Way it is presently being performed.

17. Describe clin-trial data base and oracle clinical

Clintrial, the market's leading Clinical Data Management System (CDMS).Oracle Clinical or OC is a database management system designed by Oracle to provide data management, data entry and data validation functionalities to Clinical Trials process.18. Tell me about MEDRA and what version of MEDRA did you use in your project?Medical dictionary of regulatory activities. Version 10

19. Describe SDTM?

CDISC’s Study Data Tabulation Model (SDTM) has been developed to standardize what is submitted to the FDA.

20. What is CRT?

Case Report Tabulation, Whenever a pharmaceutical company is submitting an NDA, conpany has to send the CRT's to the FDA.

21. What is annotated CRF?

Annotated CRF is a CRF(Case report form) in which variable names are written next the spaces provided to the investigator. Annotated CRF serves as a link between the raw data and the questions on the CRF. It is a valuable toll for the programmers and statisticians..
22. What do you know about 21CRF PART 11?

Title 21 CFR Part 11 of the Code of Federal Regulations deals with the FDA guidelines on electronic records and electronic signatures in the United States. Part 11, as it is commonly called, defines the criteria under which electronic records and electronic signatures are considered to be trustworthy, reliable and equivalent to paper records.

23 What are the contents of AE dataset? What is its purpose?

What are the variables in adverse event datasets?The adverse event data set contains the SUBJID, body system of the event, the preferred term for the event, event severity. The purpose of the AE dataset is to give a summary of the adverse event for all the patients in the treatment arms to aid in the inferential safety analysis of the drug.

24 What are the contents of lab data? What is the purpose of data set?

The lab data set contains the SUBJID, week number, and category of lab test, standard units, low normal and high range of the values. The purpose of the lab data set is to obtain the difference in the values of key variables after the administration of drug.

25.How did you do data cleaning? How do you change the values in the data on your own?

I used proc freq and proc univariate to find the discrepancies in the data, which I reported to my manager.

26.Have you created CRT’s, if you have, tell me what have you done in that?

Yes I have created patient profile tabulations as the request of my manager and and the statistician. I have used PROC CONTENTS and PROC SQL to create simple patient listing which had all information of a particular patient including age, sex, race etc.

27. Have you created transport files?

Yes, I have created SAS Xport transport files using Proc Copy and data step for the FDA submissions. These are version 5 files. we use the libname engine and the Proc Copy procedure, One dataset in each xport transport format file. For version 5: labels no longer than 40 bytes, variable names 8 bytes, character variables width to 200 bytes. If we violate these constraints your copy procedure may terminate with constraints, because SAS xport format is in compliance with SAS 5 datasets.

Libname sdtm “c:\sdtm_data”;Libname dm xport “c:\dm.xpt”;
Proc copy;
In = sdtm;
Out = dm;
Select dm;
Run;

28. How did you do data cleaning? How do you change the values in the data on your own?

I used proc freq and proc univariate to find the discrepancies in the data, which I reported to my manager.

29. Definitions?

CDISC- Clinical data interchange standards consortium.They have different data models, which define clinical data standards for pharmaceutical industry.

SDTM – It defines the data tabulation datasets that are to be sent to the FDA for regulatory submissions.

ADaM – (Analysis data Model)Defines data set definition guidance for creating analysis data sets.

ODM – XML – based data model for allows transfer of XML based data .

Define.xml – for data definition file (define.pdf) which is machine readable.

ICH E3: Guideline, Structure and Content of Clinical Study Reports

ICH E6: Guideline, Good Clinical Practice

ICH E9: Guideline, Statistical Principles for Clinical Trials

Title 21 Part 312.32: Investigational New Drug Application

30. Have you ever done any Edit check programs in your project, if you have, tell me what do you know about edit check programs?

Yes I have done edit check programs .Edit check programs – Data validation.

1.Data Validation – proc means, proc univariate, proc freq.Data Cleaning – finding errors.

2.Checking for invalid character values.Proc freq data = patients;Tables gender dx ae / nocum nopercent;Run;Which gives frequency counts of unique character values.

3. Proc print with where statement to list invalid data values.[systolic blood pressure - 80 to 100][diastolic blood pressure – 60 to 120]

4. Proc means, univariate and tabulate to look for outliers.Proc means – min, max, n and mean.Proc univariate – five highest and lowest values[ stem leaf plots and box plots]

5. PROC FORMAT – range checking

6. Data Analysis – set, merge, update, keep, drop in data step.

7. Create datasets – PROC IMPORT and data step from flat files.

8. Extract data – LIBNAME.9. SAS/STAT – PROC ANOVA, PROC REG.

10. Duplicate Data – PROC SORT Nodupkey or NoduplicateNodupkey – only checks for duplicates in BYNoduplicate – checks entire observation (matches all variables)For getting duplicate observations first sort BY nodupkey and merge it back to the original dataset and keep only records in original and sorted.

11.For creating analysis datasets from the raw data sets I used the PROC FORMAT, and rename and length statements to make changes and finally make a analysis data set.

31. What is Verification?

The purpose of the verification is to ensure the accuracy of the final tables and the quality of SAS programs that generated the final tables. According to the instructions SOP and the SAP I selected the subset of the final summary tables for verification.
E.g Adverse event table, baseline and demographic characteristics table.The verification results were verified against with the original final tables and all discrepancies if existed were documented.

32. What is Program Validation?

Its same as macro validation except here we have to validate the programs i.e according to the SOP I had to first determine what the program is supposed to do, see if they work as they are supposed to work and create a validation document mentioning if the program works properly and set the status as pass or fail.Pass the input parameters to the program and check the log for errors.

33. What do you lknow about ISS and ISE, have you ever produced these reports?

ISS (Integrated summary of safety):Integrates safety information from all sources (animal, clinical pharmacology, controlled and uncontrolled studies, epidemiologic data). "ISS is, in part, simply a summation of data from individual studies and, in part, a new analysis that goes beyond what can be done with individual studies."ISE (Integrated Summary of efficacy)ISS & ISE are critical components of the safety and effectiveness submission and expected to be submitted in the application in accordance with regulation. FDA’s guidance Format and Content of Clinical and Statistical Sections of Application gives advice on how to construct these summaries. Note that, despite the name, these are integrated analyses of all relevant data, not summaries.

34. Explain the process and how to do Data Validation?

I have done data validation and data cleaning to check if the data values are correct or if they conform to the standard set of rules.A very simple approach to identifying invalid character values in this file is to use PROC FREQ to list all the unique values of these variables. This gives us the total number of invalid observations. After identifying the invalid data …we have to locate the observation so that we can report to the manager the particular patient number.Invalid data can be located using the data _null_ programming.

Following is e.g

DATA _NULL_;

INFILE "C:PATIENTS,TXT" PAD;FILE PRINT; ***SEND OUTPUT TO THE OUTPUT WINDOW;

TITLE "LISTING OF INVALID DATA";

***NOTE: WE WILL ONLY INPUT THOSEVARIABLES OF INTEREST;INPUT @1 PATNO $3.@4 GENDER $1.@24 DX $3.@27 AE $1.;

***CHECK GENDER;IF GENDER NOT IN ('F','M',' ') THEN PUT PATNO= GENDER=;

***CHECK DX;
IF VERIFY(DX,' 0123456789') NE 0
THEN PUT PATNO= DX=;
***CHECK AE;
IF AE NOT IN ('0','1',' ') THEN PUT PATNO= AE=;
RUN;

For data validation of numeric values like out of range or missing values I used proc print with a where statement.

PROC PRINT DATA=CLEAN.PATIENTS;
WHERE HR NOT BETWEEN 40 AND 100 AND
HR IS NOT MISSING OR
SBP NOT BETWEEN 80 AND 200 AND
SBP IS NOT MISSING OR
DBP NOT BETWEEN 60 AND 120 AND
DBP IS NOT MISSING;TITLE "OUT-OF-RANGE VALUES FOR NUMERICVARIABLES";
ID PATNO;
VAR HR SBP DBP;
RUN;

If we have a range of numeric values ‘001’ – ‘999’ then we can first use user defined format and then use proc freq to determine the invalid values.

PROC FORMAT;
VALUE $GENDER 'F','M' = 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
VALUE $DX '001' - '999'= 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
VALUE $AE '0','1' = 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
RUN;

One of the simplest ways to check for invalid numeric values is to run either PROC MEANS or PROC UNIVARIATE.We can use the N and NMISS options in the Proc Means to check for missing and invalid data. Default (n nmiss mean min max stddev).The main advantage of using PROC UNIVARIATE (default n mean std skewness kurtosis) is that we get the extreme values i.e lowest and highest 5 values which we can see for data errors. If u want to see the patid for these particular observations …..state and ID patno statement in the univariate procedure.

35. Roles and responsibilities?
Programmer:
Develop programming for report formats (ISS & ISE shell) required by the regulatory authorities.Update ISS/ISE shell, when required.
Clinical Study Team:
Provide information on safety and efficacy findings, when required.Provide updates on safety and efficacy findings for periodic reporting.
Study Statistician
Draft ISS and ISE shell.Update shell, when appropriate.Analyze and report data in approved format, to meet periodic reporting requirements.

36. Explain Types of Clinical trials study you come across?
Single Blind Study
When the patients are not aware of which treatment they receive.
Double Blind Study
When the patients and the investigator are unaware of the treatment group assigned.
Triple Blind Study
Triple blind study is when patients, investigator, and the project team are unaware of the treatments administered.

37. What are the domains/datasets you have used in your studies?
Demog
Adverse Events
Vitals
ECG
Labs
Medical History
PhysicalExam etc

38. Can you list the variables in all the domains?
Demog: Usubjid, Patient Id, Age, Sex, Race, Screening Weight, Screening Height, BMI etc

Adverse Events: Protocol no, Investigator no, Patient Id, Preferred Term, Investigator Term, (Abdominal dis, Freq urination, headache, dizziness, hand-food syndrome, rash, Leukopenia, Neutropenia) Severity, Seriousness (y/n), Seriousness Type (death, life threatening, permanently disabling), Visit number, Start time, Stop time, Related to study drug?

Vitals: Subject number, Study date, Procedure time, Sitting blood pressure, Sitting Cardiac Rate, Visit number, Change from baseline, Dose of treatment at time of vital sign, Abnormal (yes/no), BMI, Systolic blood pressure, Diastolic blood pressure.

ECG: Subject no, Study Date, Study Time, Visit no, PR interval (msec), QRS duration (msec), QT interval (msec), QTc interval (msec), Ventricular Rate (bpm), Change from baseline, Abnormal.

Labs: Subject no, Study day, Lab parameter (Lparm), lab units, ULN (upper limit of normal), LLN (lower limit of normal), visit number, change from baseline, Greater than ULN (yes/no), lab related serious adverse event (yes/no).Medical History: Medical Condition, Date of Diagnosis (yes/no), Years of onset or occurrence, Past condition (yes/no), Current condition (yes/no).

PhysicalExam: Subject no, Exam date, Exam time, Visit number, Reason for exam, Body system, Abnormal (yes/no), Findings, Change from baseline (improvement, worsening, no change), Comments

39. Give me the example of edit ckecks you made in your programs?Examples of Edit Checks

Demog:Weight is outside expected rangeBody mass index is below expected
( check weight and height)
Age is not within expected range.
DOB is greater than the Visit date or not..
Gender value is a valid one or invalid. etc
Adverse Event
Stop is before the start or visit Start is before birthdate Study medicine discontinued due to adverse event but completion indicated (COMPLETE =1)

Labs
Result is within the normal range but abnormal is not blank or ‘N’Result is outside the normal range but abnormal is blank

Vitals
Diastolic BP > Systolic BP

Medical History
Visit date prior to Screen datePhysicalPhysical exam is normal but comment included

40. What are the advantages of using SAS in clinical data management? Why should not we use other software products in managing clinical data?
ADVANTAGES OF USING A SAS®-BASED SYSTEM
Less hardware is required.
A Typical SAS®-based system can utilize a standard file server to store its databases and does not require one or more dedicated servers to handle the application load. PC SAS® can easily be used to handle processing, while data access is left to the file server. Additionally, as presented later in this paper, it is possible to use the SAS® product SAS®/Share to provide a dedicated server to handle data transactions.
Fewer personnel are required.
Systems that use complicated database software often require the hiring of one ore more DBA’s (Database Administrators) who make sure the database software is running, make changes to the structure of the database, etc. These individuals often require special training or background experience in the particular database application being used, typically Oracle. Additionally, consultants are often required to set up the system and/or studies since dedicated servers and specific expertise requirements often complicate the process.Users with even casual SAS® experience can set up studies. Novice programmers can build the structure of the database and design screens. Organizations that are involved in data management almost always have at least one SAS® programmer already on staff. SAS® programmers will have an understanding of how the system actually works which would allow them to extend the functionality of the system by directly accessing SAS® data from outside of the system.Speed of setup is dramatically reduced. By keeping studies on a local file server and making the database and screen design processes extremely simple and intuitive, setup time is reduced from weeks to days.All phases of the data management process become homogeneous. From entry to analysis, data reside in SAS® data sets, often the end goal of every data management group. Additionally, SAS® users are involved in each step, instead of having specialists from different areas hand off pieces of studies during the project life cycle.No data conversion is required. Since the data reside in SAS® data sets natively, no conversion programs need to be written.Data review can happen during the data entry process, on the master database. As long as records are marked as being double-keyed, data review personnel can run edit check programs and build queries on some patients while others are still being entered.Tables and listings can be generated on live data. This helps speed up the development of table and listing programs and allows programmers to avoid having to make continual copies or extracts of the data during testing.43. Have you ever had to follow SOPs or programming guidelines?SOP describes the process to assure that standard coding activities, which produce tables, listings and graphs, functions and/or edit checks, are conducted in accordance with industry standards are appropriately documented.It is normally used whenever new programs are required or existing programs required some modification during the set-up, conduct, and/or reporting clinical trial data.44. Describe the types of SAS programming tasks that you performed: Tables? Listings? Graphics? Ad hoc reports? Other?Prepared programs required for the ISS and ISE analysis reports. Developed and validated programs for preparing ad-hoc statistical reports for the preparation of clinical study report. Wrote analysis programs in line with the specifications defined by the study statistician. Base SAS (MEANS, FREQ, SUMMARY, TABULATE, REPORT etc) and SAS/STAT procedures (REG, GLM, ANOVA, and UNIVARIATE etc.) were used for summarization, Cross-Tabulations and statistical analysis purposes. Created Statistical reports using Proc Report, Data _null_ and SAS Macro. Created, derived and merged and pooled datasets,listings and summary tables for Phase-I and Phase-II of clinical trials.45. Have you been involved in editing the data or writing data queries?If your interviewer asks this question, the u should ask him what he means by editing the data… and data queries…

41. Are you involved in writing the inferential analysis plan? Table’s specifications?

42. What do you feel about hardcoding?
Programmers sometime hardcode when they need to produce report in urgent. But it is always better to avoid hardcoding, as it overrides the database controls in clinical data management. Data often change in a trial over time, and the hardcode that is written today may not be valid in the future.Unfortunately, a hardcode may be forgotten and left in the SAS program, and that can lead to an incorrect database change.

43. How do you write a test plan?

Before writing "Test plan" you have to look into on "Functional specifications". Functional specifications itself depends on "Requirements", so one should have clear understanding of requirements and functional specifications to write a test plan.

44. What is the difference between verification and validation?
Although the verification and validation are close in meaning, "verification" has more of a sense of testing the truth or accuracy of a statement by examining evidence or conducting experiments, while "validate" has more of a sense of declaring a statement to be true and marking it with an indication of official sanction.

45.What other SAS features do you use for error trapping and data validation?
Conditional statements, if then else.
Put statement
Debug option.

46. What is PROC CDISC?
It is new SAS procedure that is available as a hotfix for SAS 8.2 version and comes as a part withSAS 9.1.3 version.
PROC CDISC is a procedure that allows us to import (and export XML files that are compliant with the CDISC ODM version 1.2 schema.
For more details refer SAS programming in the Pharmaceutical Industry text book.

47) What is LOCF?
Pharmaceutical companies conduct longitudinalstudies on human subjects that often span several months. It is unrealistic to expect patients to keep every scheduled visit over such a long period of time.Despite every effort, patient data are not collected for some time points. Eventually, these become missing values in a SAS data set later. For reporting purposes,the most recent previously available value is substituted for each missing visit. This is called the Last Observation Carried Forward (LOCF).LOCF doesn't mean last SAS dataset observation carried forward. It means last non-missing value carried forward. It is the values of individual measures that are the "observations" in this case. And if you have multiple variables containing these values then they will be carried forward independently.

48) ETL process:

Extract, transform and Load:
Extract:

The 1st part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems.

Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as IMS or other data structures such as VSAM or ISAM.

Extraction converts the data into a format for transformation processing.An intrinsic part of the extraction is the parsing of extracted data, resulting in a check if the data meets an expected pattern

Transform:The transform stage applies a series of rules or functions to the extracted data from the source to derive the data to be loaded to the end target. Some data sources will require very little or even no manipulation of data. In other cases, one or more of the following transformations types to meet the business and technical needs of the end target may be required:·

Selecting only certain columns to load (or selecting null columns not to load) · Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female), this is called automated data cleansing; no manual cleansing occurs during ETL · Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M) ·

Joining together data from multiple sources (e.g., lookup, merge, etc.) · Generating surrogate key values · Transposing or pivoting (turning multiple columns into multiple rows or vice versa) · Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string in one column as individual values in different columns) ·

Applying any form of simple or complex data validation; if failed, a full, partial or no rejection of the data, and thus no, partial or all the data is handed over to the next step, depending on the rule design and exception handling. Most of the above transformations itself might result in an exception, e.g. when a code-translation parses an unknown code in the extracted data.Load:The load phase loads the data into the end target, usually being the data warehouse (DW).

Depending on the requirements of the organization, this process ranges widely. Some data warehouses might weekly overwrite existing information with cumulative, updated data, while other DW (or even other parts of the same DW) might add new data in a historized form, e.g. hourly. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW.

As the load phase interacts with a database, the constraints defined in the database schema as well as in triggers activated upon data load apply (e.g. uniqueness, referential integrity, mandatory fields), which also contribute to the overall data quality performance of the ETL process.
source: wikipedia

SAS interview Q & A: PROC SQl and SAS GRAPH and ODS

PROC SQL:

1. What are the types of join avaiable with Proc SQL ?
A.  Different Proc SQL JOINs
JOIN: Return rows when there is at least one match in both tables

•LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table

•RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table

•FULL JOIN: Return rows when there is a match in one of the tables (source: www.w3schools.com)

There are 2 types of SQL JOINS – INNER JOINS and OUTER JOINS. If you don't put INNER or OUTER keywords in front of the SQL JOIN keyword, then INNER JOIN is used. In short "INNER JOIN" = "JOIN" (note that different databases have different syntax for their JOIN clauses).
The INNER JOIN will select all rows from both tables as long as there is a match between the columns we are matching on. In case we have a customer in the Customers table, which still hasn't made any orders (there are no entries for this customer in the Sales table), this customer will not be listed in the result of our SQL query above. 
The second type of SQL JOIN is called SQL OUTER JOIN and it has 2 sub-types called LEFT OUTER JOIN and RIGHT OUTER JOIN.

The LEFT OUTER JOIN or simply LEFT JOIN (you can omit the OUTER keyword in most databases), selects all the rows from the first table listed after the FROM clause, no matter if they have matches in the second table.

The RIGHT OUTER JOIN or just RIGHT JOIN behaves exactly as SQL LEFT JOIN, except that it returns all rows from the second table (the right table in our SQL JOIN statement).  source: sql-tutorial.net

Proc SQL INNER JOIN :



Proc SQL LEFT JOIN :




Proc SQL RIGHT JOIN :



Proc SQL FULL JOIN:


Proc SQL UNION Operator:




Proc SQL tutorials:




2. Have you ever used PROC SQL for data summarization?
A. Yes, I have used it for summarization at times…For e.g if I have to calculate the max value of BP for patients 101 102 and 103 then I use the max (bpd) function to get the maximum value and use group by statement to group the patients accordingly.

3. Tell me about your SQL experience?
A. I have used the SAS/ACCESS SQL pass thru facility for connection with external databases and importing tables from them and also Microsoft access and excel files.Besides this, lot of times I have used PROC SQL for joining tables.

4. Once you have had the data read into SAS datasets are you more of a data step programmer or a PROC SQL programmer?
A. It depends on what types of analysis datasets are required for creating tables but I am more of a data step programmer as it gives me more flexibility.For e.g creating a change from baseline data set for blood pressure sometimes I have to retain certain values …use arrays ….or use the first. -and last. variables.

5. What types of programming tasks do you use PROC SQL for versus the data step?
A. Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.PROC SQL can save a great deal of time if u want to filter the variables while selecting or u can modify them …apply format….creating new variables , macrovariables…as well as subsetting the data.PROC SQL offers great flexibility for joining tables.

6. Have u ever used PROC SQL to read in a raw data file?
A. No. I don’t think it can be used.

7. How do you merge data in Proc SQL?
The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.

PROC SQL;
CREATE TABLE BOTH ASSELECT A.PATIENT,A.DATE FORMAT=DATE7. AS DATE, A.PULSE,B.MED, B.DOSES, B.AMT FORMAT=4.1
FROM VITALS A
INNER JOIN DOSING BON (A.PATIENT = B.PATIENT)
AND(A.DATE = B.DATE)
ORDER BY PATIENT, DATE;
QUIT;

8. What are the statements in Proc SQl?
Select, From, Where, Group By, Having, Order.

PROC SQL;
CREATE TABLE HIGHBPP2 ASSELECT PATIENT, COUNT (PATIENT) AS N,DATE FORMAT=DATE7., MAX(BPD) AS BPDHIGH
FROM VITALS
WHERE PATIENT IN (101 102 103)
GROUP BY PATIENTHAVING BPD = CALCULATED BPDHIGH
ORDER BY CALCULATED BPDHIGH;
Quit;

9. Why and when do you use Proc SQl?
Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.

PROC SQL can save a great deal of time if u want to filter the variables while selecting or we can modify them, apply format and creating new variables, macrovariables…as well as subsetting the data. PROC SQL offers great flexibility for joining tables.

SAS GRAPH:

1. What type of graphs have you have generated using SAS?
A. I have used Proc GPLOT where I have created change from baseline scatter plots. I have also used Proc LIFETEST to create Kaplan-Meier survival estimates plots for survival analysis to determine which treatment displays better time-to-event distribution.

2. Have you ever used GREPLAY?
A. YES, I have used the PROC GREPLAY point and click interface to integrate 4 graphs in one page. Which were produced by the reg procedure.

3. What is the symbol statement used for?
A. Symbol statement is used for placing symbols in the graphics output.Associated variables can specify the color, font and heights of the symbols displayed.

4. Have you ever used the annotate facility? What is a situation when you have had to use the ANNOTATE facility in the past?
A. Yes, I have used the annotate facility for graphs. I have used the annotate facility to position labels in the Kaplan-Meier survival estimates, where I had to specify the function as ‘label’ and give the x and y co-ordinates and the position where this label is to be placed.

ODS (OUTPUT DELIVERY SYSTEM):
1. What are all the ODS procedure have u encountered?
Tracing and selecting the procedure Output;

ODS Trace on;
Proc steps…;
Run;
ODS Trace off;
ODS Select statement,
Proc steps…;
ODS Select output-object-list;
Run;
ODS Output statement,
ODS output output-object= new SAS dataset;
ODS html body = “path\marinebody.html”Contents = “path\marineTOC.html”Page = “ path\marinepage.html”Frame= “path\marineframe.html”;
…..ODS html close;

ODS rtf file = "filename.rtf" options;
Options like columns=n, bodytitle, SASdate and style.ODS rtf close;


SimilarlyODS Pdf file = "filename.pdf" options;
…….. ODS pdf close;

2. What is your experience with ODS?
A. I have used ODS for creating files output formats RTF HTML and PDF as per the requirement of my manager. HTML files could be posted on the web site for viewing or can also be imported into word processors.
ODS HTML body = ‘path’ Contents= ‘path’Frame = ‘path’ODS HTML close;
ODS RTF FILE = ‘path’;
ODS RTF close;

When we create RTF output we can copy it into word document and edit and resize it like word tables.

3. What does the trace option do?
A. ODS Trace is used to find the names of the particular output objects when several of them are created by some procedure.
ODS TRACE ON;
ODS TRACE OFF;


SAS ODS


ODS, YES! Odious, NO! – An Introduction to the SAS Output Delivery System



Using ODS to Create Customised Output




Controlling SAS output using ODS



The Output Delivery System (ODS) from Scratch




SAS 9 ODS Tip Sheet:

DISCLAIMER


This blog is not responsible for any kind of copyright violation. My blog just collects the links hosted or posted by other server/people/search engines. The creator of this page or the ISP(s) hosting any content on this site take no responsibility for the way you use the information provided on this site.

If anybody has any copyright claim on it and doesn’t wish the information provided to be shown on our site, please do respond to learnsasonline@gmail.com.We shall remove them off immediately. Any inconvenience is regretted. Please do mention your exact grievance/problems with respect to certain third party links.

We assure you that appropriate action will be taken. Thank you..We cannot guarantee the authenticity or trueness of the study material provided in this blog. We aren't responsible for any such issues.SAS® is trademark of SAS Inc.www.studysas.blogspot.com is not endorsed, associated or affiliated in any way with SAS Inc. All other trademarks, company names, products names and logos mentioned are the property of their respective owners.

SAS interview questions: Macros

SAS Macros Interview Questions and Answers

SAS Macros Interview Questions and Answers

This document provides a collection of interview questions and answers related to SAS Macros, which are a powerful feature in SAS for automating repetitive tasks and dynamically generating code.

1. What is a macro in SAS?

A macro in SAS is a code that generates SAS statements dynamically. Macros allow for automation, reusability, and parameterization of code, making it more flexible and efficient.

2. How do you create a simple macro in SAS?

You can create a macro in SAS using the %MACRO and %MEND statements. Here's an example:

%macro simple_macro;
    data new_data;
        set old_data;
        /* Your SAS code here */
    run;
%mend simple_macro;
%simple_macro;

3. What is the difference between a macro variable and a data step variable?

Macro variables are defined and used in the macro language (outside of the DATA step) and are available throughout the SAS session. Data step variables are used within a DATA step and are not accessible outside of that step unless explicitly passed to a macro.

4. How do you define a macro variable using %LET?

A macro variable is defined using the %LET statement in SAS.

%let varname = value;

5. How do you define a macro variable using CALL SYMPUT?

CALL SYMPUT is used within a DATA step to assign a value to a macro variable based on data or calculations within that step.

data _null_;
    call symput('varname', value);
run;

6. What is the use of the %IF-%THEN statement in macros?

The %IF-%THEN statement in a macro is used to perform conditional logic based on the value of macro variables.

%macro check_value(val);
    %if &val = 1 %then %do;
        %put Value is 1;
    %end;
    %else %do;
        %put Value is not 1;
    %end;
%mend check_value;
%check_value(1);

7. How do you create a macro with eters?

You can create a macro with parameters by specifying the parameters in parentheses after the macro name.

%macro example_macro(test1, test2);
    %put &test1 &test2;
%mend example_macro;
%example_macro(value1, value2);

8. Explain the difference between positional and keyword parameters in macros.

Positional parameters are passed in the order they are defined, while keyword parameters are specified by name and can be passed in any order. Keyword parameters can also have default values.

%macro example_macro(test1=default1, test2=default2);
    %put &test1 &test2;
%mend example_macro;
%example_macro(test2=value2);

9. How do you debug a macro in SAS?

Macro debugging can be done using options such as MPRINT, MLOGIC, and SYMBOLGEN. These options display macro execution details in the SAS log.

options mprint mlogic symbolgen;
%example_macro;

10. What is the purpose of the %DO-%END statement in a macro?

The %DO-%END statement is used to create a block of code that repeats a certain number of times or executes conditionally within a macro.

%macro loop_macro;
    %do i = 1 %to 5;
        %put Iteration &i;
    %end;
%mend loop_macro;
%loop_macro;

11. How do you pass a macro variable to a DATA step?

You can pass a macro variable to a DATA step by referencing it with an ampersand (&) within the DATA step.

%let threshold = 100;
data filtered_data;
    set original_data;
    if value > &threshold;
run;

12. What is the difference between %GLOBAL and %LOCAL macro variables?

%GLOBAL macro variables are accessible throughout the entire SAS session, while %LOCAL macro variables are only available within the macro in which they are defined.

%macro test_macro;
    %local local_var;
    %global global_var;
    %let local_var = Local;
    %let global_var = Global;
    %put &local_var &global_var;
%mend test_macro;
%test_macro;
%put &global_var; /* Global variable is accessible */
%put &local_var; /* Local variable is not accessible outside the macro */

13. How do you concatenate macro variables?

Macro variables can be concatenated by placing them next to each other or by using the %STR() function to prevent unwanted interpretation of special characters.

%let var1 = Hello;
%let var2 = World;
%put &var1 &var2; /* Outputs: Hello World */
%put %str(&var1&var2); /* Outputs: HelloWorld */

14. Explain the use of the %EVAL function in macros.

%EVAL is used to evaluate arithmetic and logical expressions in macro language.

%let sum = %eval(5 + 3);
%put ∑ /* Outputs: 8 */

15. How do you create a dynamic macro that adapts to different datasets?

You can create a dynamic macro by using macro parameters and conditional logic to adapt the code to different datasets or requirements.

%macro dynamic_macro(dataset, var);
    proc print data=&dataset;
        var &var;
    run;
%mend dynamic_macro;
%dynamic_macro(sashelp.class, age);

16. What is the purpose of the %SYSFUNC function?

%SYSFUNC allows you to execute DATA step functions within a macro. It is useful for performing tasks like formatting dates or manipulating strings within macro code.

%let current_date = %sysfunc(today(), date9.);
%put ¤t_date; /* Outputs the current date in DATE9 format */

17. How do you create a macro that conditionally executes code?

You can create a macro that conditionally executes code using %IF-%THEN statements within the macro definition.

%macro conditional_macro(flag);
    %if &flag = Y %then %do;
        %put Executing block 1;
    %end;
    %else %do;
        %put Executing block 2;
    %end;
%mend conditional_macro;
%conditional_macro(Y);

18. How do you use the %INCLUDE statement in a macro?

The %INCLUDE statement is used to include external SAS code files within a macro or program.

%macro include_macro;
    %include 'path-to-file.sas';
%mend include_macro;
%include_macro;

19. How do you debug a complex macro?

For debugging complex macros, you can use the debugging options (MPRINT, MLOGIC, SYMBOLGEN) and break the macro into smaller parts, testing each part individually.

20. What is the CALL EXECUTE statement and how is it used with macros?

CALL EXECUTE is used to dynamically generate SAS code that is executed after the current DATA step completes. It is often used in conjunction with macros to dynamically generate and execute code.

data _null_;
    set dataset;
    call execute('%my_macro(' || var || ')');
run;

21. How do you use the %DO loop to repeat a block of code in a macro?

The %DO loop allows you to repeat a block of code multiple times within a macro.

%macro repeat_macro;
    %do i = 1 %to 5;
        %put This is iteration &i;
    %end;
%mend repeat_macro;
%repeat_macro;

22. How do you conditionally include or exclude code in a macro?

Conditional inclusion or exclusion of code in a macro can be done using %IF-%THEN statements.

%macro condition_macro(flag);
    %if &flag = 1 %then %do;
        proc print data=sashelp.class;
        run;
    %end;
    %else %do;
        proc contents data=sashelp.class;
        run;
    %end;
%mend condition_macro;
%condition_macro(1);

23. How do you use the %PUT statement in SAS macros?

The %PUT statement is used to write text or the value of macro variables to the SAS log.

%macro log_macro;
    %let var = Hello;
    %put The value of var is &var;
%mend log_macro;
%log_macro;

24. How do you handle errors in SAS macros?

Error handling in SAS macros can be managed by checking the status of macro variables, using %IF-%THEN logic, and writing appropriate messages to the SAS log using %PUT.

%macro error_handling(flag);
    %if &flag ne 1 %then %do;
        %put ERROR: The flag is not set to 1;
    %end;
%mend error_handling;
%error_handling(0);

25. How do you use the %SYSGET function?

%SYSGET retrieves the value of an operating system environment variable and assigns it to a macro variable.

%let home_dir = %sysget(HOME);
%put The home directory is &home_dir;

26. How do you use the %SUPERQ function in SAS macros?

%SUPERQ is used to prevent the resolution of macro variables, especially when dealing with special characters or unresolved macro variables.

%let var = %nrstr(&var);
%put %superq(var);

27. How do you use the %BQUOTE function?

%BQUOTE masks special characters or mnemonics in the macro variable without fully resolving the macro expression.

%let text = This is a test %bquote(%str(%)macro);
%put &text;

28. How do you ensure that a macro variable is resolved only once in a loop?

To ensure that a macro variable is resolved only once in a loop, you can store the resolved value in another macro variable before entering the loop.

%macro single_resolution;
    %let initial_value = %sysfunc(today(), date9.);
    %do i = 1 %to 5;
        %put Iteration &i: Date is &initial_value;
    %end;
%mend single_resolution;
%single_resolution;

29. How do you include comments within a macro?

Comments within a macro can be included using either macro comments (%*) or regular SAS comments (*; or /* */).

%macro commented_macro;
    %* This is a macro comment;
    /* This is a standard comment */
    data _null_;
        x = 1; * Another comment;
    run;
%mend commented_macro;
%commented_macro;

30. How do you dynamically create multiple datasets using a macro?

You can dynamically create multiple datasets using a macro by looping through a set of values and using them in the DATA step.

%macro create_datasets;
    %do i = 1 %to 3;
        data dataset&i;
            x = &i;
        run;
    %end;
%mend create_datasets;
%create_datasets;

31. How do you pass parameters to a macro that need to be processed further?

You can pass parameters to a macro and process them using macro functions or conditional logic.

%macro process_params(test1, test2);
    %let combined = %upcase(&test1)_%lowcase(&test2);
    %put &combined;
%mend process_params;
%process_params(HELLO, WORLD);

32. How do you store the result of a macro execution in a dataset?

You can store the result of a macro execution in a dataset by using a DATA step and assigning the macro result to a dataset variable.

%macro store_result;
    data results;
        result = "&sysdate9.";
        output;
    run;
%mend store_result;
%store_result;

33. How do you execute a macro conditionally based on the existence of a dataset?

You can conditionally execute a macro based on the existence of a dataset using the EXIST function in a macro.

%macro check_dataset(dsname);
    %if %sysfunc(exist(&dsname)) %then %do;
        %put Dataset &dsname exists.;
    %end;
    %else %do;
        %put Dataset &dsname does not exist.;
    %end;
%mend check_dataset;
%check_dataset(sashelp.class);

34. How do you create a macro that generates code for multiple procedures?

You can create a macro that generates code for multiple procedures by including each procedure within the macro and using conditional logic or loops to control their execution.

%macro multi_proc;
    proc means data=sashelp.class;
        var age height weight;
    run;

    proc freq data=sashelp.class;
        tables sex;
    run;
%mend multi_proc;
%multi_proc;

35. How do you reset a macro variable to its original value after modifying it?

You can reset a macro variable to its original value by storing the original value in another macro variable before modifying it, and then reassigning it.

%macro reset_var;
    %let original_var = &var;
    %let var = NewValue;
    %put &var;
    %let var = &original_var;
    %put &var;
%mend reset_var;

36. How do you create a macro that generates a report based on user input?

You can create a macro that generates a report based on user input by passing the user inputs as parameters to the macro.

%macro generate_report(dataset, var);
    proc print data=&dataset;
        var &var;
    run;
%mend generate_report;
%generate_report(sashelp.class, age height weight);

37. How do you use the %SCAN function in a macro?

The %SCAN function extracts words from a string based on the specified delimiter.

%let text = SAS Macro Programming;
%let first_word = %scan(&text, 1);
%let second_word = %scan(&text, 2);
%put First word: &first_word;
%put Second word: &second_word;

38. How do you create a macro that generates a summary report?

You can create a macro that generates a summary report by including a PROC MEANS or PROC SUMMARY step within the macro.

%macro summary_report(dataset);
    proc means data=&dataset;
        var age height weight;
    run;
%mend summary_report;
%summary_report(sashelp.class);

39. How do you use the %UPCASE function in a macro?

The %UPCASE function converts a string to uppercase.

%let text = sas;
%put %upcase(&text); /* Outputs: SAS */

40. How do you create a macro that loops through a list of variables?

You can create a macro that loops through a list of variables by using a macro loop with the variable names as a list.

%macro loop_variables;
    %let vars = age height weight;
    %do i = 1 %to %sysfunc(countw(&vars));
        %let var = %scan(&vars, &i);
        proc print data=sashelp.class;
            var &var;
        run;
    %end;
%mend loop_variables;
%loop_variables;

41. How do you use the %QUOTE function in a macro?

The %QUOTE function masks special characters or mnemonics in the macro variable, similar to %BQUOTE, but %QUOTE is used during macro execution.

%let text = This is a test %quote(%str(%)macro);
%put &text;

42. How do you create a macro that runs multiple procedures conditionally?

You can create a macro that runs multiple procedures conditionally by using %IF-%THEN logic within the macro.

%macro multi_conditional(proc_type);
    %if &proc_type = means %then %do;
        proc means data=sashelp.class;
            var age height weight;
        run;
    %end;
    %else %if &proc_type = freq %then %do;
        proc freq data=sashelp.class;
            tables sex;
        run;
    %end;
%mend multi_conditional;
%multi_conditional(means);

43. How do you use the %NRQUOTE function?

%NRQUOTE masks special characters or mnemonics in the macro variable and prevents them from being resolved.

%let text = %nrstr(&var);
%put %nrquote(&text);

44. How do you create a macro that exports data to different formats?

You can create a macro that exports data to different formats by including PROC EXPORT steps within the macro, with format options as parameters.

%macro export_data(dataset, format);
    proc export data=&dataset
        outfile="output_file.&format"
        dbms=&format
        replace;
    run;
%mend export_data;
%export_data(sashelp.class, csv);

45. How do you handle missing values in a macro?

Handling missing values in a macro can be done by checking for missing values using %IF statements or DATA step logic within the macro.

%macro check_missing(var);
    %if &var = . %then %do;
        %put WARNING: Missing value detected for &var;
    %end;
%mend check_missing;
%check_missing(.);

46. How do you use the %SYMDEL statement to delete macro variables?

The %SYMDEL statement is used to delete one or more macro variables.

%let var1 = value1;
%let var2 = value2;
%symdel var1 var2;
%put &var1 &var2; /* Variables are deleted and no value is printed */

47. How do you create a macro that generates a histogram for different variables?

You can create a macro that generates a histogram for different variables by including a PROC SGPLOT step within the macro and looping through the variables.

%macro histogram(dataset, vars);
    %do i = 1 %to %sysfunc(countw(&vars));
        %let var = %scan(&vars, &i);
        proc sgplot data=&dataset;
            histogram &var;
        run;
    %end;
%mend histogram;
%histogram(sashelp.class, age height weight);

48. How do you use the %SUBSTR function in a macro?

The %SUBSTR function extracts a substring from a macro variable value.

%let text = SASMacro;
%let sub_text = %substr(&text, 1, 3);
%put &sub_text; /* Outputs: SAS */

49. How do you use the %INDEX function in a macro?

The %INDEX function searches for a substring within a string and returns the position of the first occurrence.

%let text = SASMacro;
%let position = %index(&text, Macro);
%put &position; /* Outputs: 4 */

50. How do you create a macro that conditionally stops execution?

You can create a macro that conditionally stops execution using the %ABORT statement within the macro.

%macro stop_execution(condition);
    %if &condition = stop %then %do;
        %abort cancel;
    %end;
    %else %do;
        %put Execution continues;
    %end;
%mend stop_execution;
%stop_execution(stop);

SAS Interview Questions: Base SAS

Very Basic:

What SAS statements would you code to read an external raw data file into a DATA step?
You would use the INFILE statement to specify the location of the external raw data file, and the INPUT statement to read in the data into the variables in a SAS dataset.

How do you read in the variables that you need?
You use the INPUT statement with specific column pointers, such as @5 or 12-17, to define where each variable is located in the raw data file.

Are you familiar with special input delimiters? How are they used?
Yes, special input delimiters like DLM and DSD are used to specify how fields are separated in the data. They are included in the INFILE statement. For example, the DLM option can be used to specify a comma as a delimiter for CSV files. The DSD option is used for comma-separated values (CSV) files and treats consecutive delimiters as missing values and ignores delimiters enclosed in quotation marks.

If reading a variable-length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?
You can use the MISSOVER option in the INFILE statement to prevent SAS from reading the next record when the last variable on the line doesn't have a value. If some data lines are shorter than expected, you can use the TRUNCOVER option to ensure that SAS reads the remaining part of the line without moving to the next record.

What is the difference between an informat and a format? Name three informats or formats.
- Informats are used to read data into SAS variables. They define how raw data is read into SAS variables. - Formats are used to write data or to control how the data appears in output. Examples of informats: - MMDDYYw.: Reads dates in MM/DD/YY format. - DATEw.: Reads dates in various formats. - COMMAw.: Reads numeric values with commas. Examples of formats: - MMDDYY10.: Displays dates as MM/DD/YYYY. - DOLLAR12.2: Displays numbers as currency with commas and two decimal places. - WORDDATE18.: Displays dates as fully spelled-out words.

Name and describe three SAS functions that you have used, if any.
1. LENGTH: Returns the length of a character string, excluding trailing blanks. - Example:
data _null_;
   a = 'my cat';
   x = length(a);
   put x=;
run;
- Result: x = 6 2. SUBSTR: Extracts a substring from a character variable starting at a specified position and of a specified length. - Example:
data _null_;
   a = '(916)734-6241';
   x = substr(a, 2, 3);
   put x=;
run;
- Result: x = '916' 3. TRIM: Removes trailing blanks from a character string. - Example:
data _null_;
   a = 'my ';
   b = 'cat';
   x = trim(a) || b;
   put x=;
run;
- Result: x = 'mycat' 4. SUM: Returns the sum of non-missing values. - Example:
data _null_;
   x = sum(3, 5, .);
   put x=;
run;
- Result: x = 8 5. INT: Returns the integer portion of a numeric value. - Example:
data _null_;
   x = int(5.67);
   put x=;
run;
- Result: x = 5

How would you code the criteria to restrict the output to be produced?
Use the WHERE statement or clause to filter the dataset based on specific criteria before output.

What is the purpose of the trailing @ and the @@? How would you use them?
- The single trailing @ holds the input line for continued processing across multiple INPUT statements. - The double trailing @@ holds the input line for continued reading across multiple iterations of the DATA step until the end of the input line is reached. Example of double trailing @@ when reading multiple observations per line:
data dsn;
   input sex $ days @@;
   datalines;
F 53 F 56 F 60 M 46 M 52 M 58
;
run;

Under what circumstances would you code a SELECT construct instead of IF statements?
You would use a SELECT statement when you have a long series of mutually exclusive conditions, especially when the conditions involve comparisons between numeric values. The SELECT construct can be more efficient than multiple IF-THEN/ELSE statements because it reduces CPU time. Example:
select;
   when (condition1) do;
      /* actions */
   end;
   when (condition2) do;
      /* actions */
   end;
   otherwise do;
      /* actions */
   end;
end;

What statement do you code to tell SAS that it is to write to an external file?
Use the FILE statement to specify the external file, and the PUT statement to write data to that file.

If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?
Use the PUT _ALL_; statement or the PUT _INFILE_; statement to write the entire record without coding each variable individually.

If you're not wanting any SAS output from a DATA step, how would you code the DATA statement to prevent SAS from producing a dataset?
Use the DATA _NULL_; statement to execute a DATA step without creating an output dataset.

What is the one statement to set the criteria of data that can be coded in any step?
The WHERE statement can be used in any SAS procedure or DATA step to filter data based on specific criteria.

Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.
Yes, SAS code can be linked using macros or by including external files with the %INCLUDE statement, allowing for reusable code across multiple programs.

How would you include common or reuse code to be processed along with your statements?
Common or reusable code can be included in SAS programs using SAS Macros or the %INCLUDE statement.

When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: SCAN, INDEX, or INDEXC?
The INDEX function is best for locating a specific substring within a character string.

If you have a dataset that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables?
Use the KEEP statement or the KEEP= dataset option to select only the variables you need.

Code a PROC SORT on a dataset containing State, District, and County as the primary variables, along with several numeric variables.
proc sort data=one;
   by State District County;
run;

How would you delete duplicate observations?
Use the NODUPRECS option in the PROC SORT procedure.

How would you delete observations with duplicate keys?
Use the NODUPKEY option in the PROC SORT procedure.

How would you code a merge that will keep only the observations that have matches from both sets?
Use an IF statement in the merge process to check for matches, for example:
data matched;
   merge dataset1(in=a) dataset2(in=b);
   by key;
   if a and b;
run;

How would you code a merge that will write the matches of both to one dataset, the non-matches from the left-most data?
Use IF statements to differentiate between matches and non-matches, outputting them to different datasets:
data both_matches left_only;
   merge dataset1(in=a) dataset2(in=b);
   by key;
   if a and b then output both_matches;
   else if a and not b then output left_only;
run;

What is the Program Data Vector (PDV)? What are its functions?
The Program Data Vector (PDV) is an area in memory where SAS builds a dataset, one observation at a time. During the DATA step's execution, SAS reads data into the PDV, processes it, and then writes it to the output dataset. The PDV includes all variables in the DATA step, and two automatic variables: _N_, which counts the number of iterations, and _ERROR_, which flags data errors.

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
SAS both compiles and interprets code. During the compilation phase, SAS translates the code into machine language and checks for syntax errors. During execution, SAS interprets the compiled code to perform the data processing.

At compile time when a SAS dataset is read, what items are created?
At compile time, SAS creates the Input Buffer, the PDV, and Descriptor Information, including the attributes of variables.

Name statements that are recognized at compile time only.
The PUT statement is recognized at compile time for the purpose of writing data to the log or to external files.

Name statements that are execution only.
The INFILE and INPUT statements are execution-only because they read data during the execution phase.

Identify statements whose placement in the DATA step is critical.
Statements such as DATA, INPUT, and RUN must be correctly placed in the DATA step to control the flow and execution of the program.

Name statements that function at both compile and execution time.
The INPUT statement functions at both compile time and execution time.

In the flow of DATA step processing, what is the first action in a typical DATA Step?
The first action in a DATA step is the creation of the DATA statement, which initializes the process. Each time the DATA statement executes, SAS increments the _N_ variable by 1.

What is _N_?
_N_ is an automatic variable in SAS that counts the number of iterations in the DATA step. It is particularly useful for subsetting data, controlling loops, or performing operations based on iteration counts. Example:
data new_dataset;
   set old_dataset;
   if mod(_n_, 3) = 1;
run;
This example selects every third record from the dataset.

How do I convert a numeric variable to a character variable?
Use the PUT function to convert a numeric variable to a character variable.

How do I convert a character variable to a numeric variable?
Use the INPUT function to convert a character variable to a numeric variable.

How can I compute the age of something?
To compute the age, given two SAS date variables birthdate and calcdate, you can use:
age = int(intck('month', birthdate, calcdate) / 12);
if month(birthdate) = month(calcdate) then age = age - (day(birthdate) > day(calcdate));

How can I compute the number of months between two dates?
Given two SAS date variables startdate and enddate, you can compute the number of months between them as:
months = intck('month', startdate, enddate) - (day(enddate) < day(startdate));

How can I determine the position of the nth word within a character string?
Use a combination of the INDEXW and SCAN functions:
pos = indexw(string, scan(string, n));

I need to reorder characters within a string... should I use SUBSTR?
You can use the TRANSLATE function for simple reordering, which may be more efficient than using multiple SUBSTR calls. For example, to move the first character of a 4-character string to the last:
reorder = translate('2341', string, '1234');

How can I format a SAS date variable so that December 25, 1995, appears as '19951225' (with no separator)?
You can use a combination of the YEAR and MMDDYY formats to display the value:
put sasdate year4. sasdate mmddyy4.;
Or use the PUT and COMPRESS functions to store the value:
newvar = compress(put(sasdate, yymmdd10.), '/');

How can I format a SAS time variable with a leading zero for hours 1-9?
Use a combination of the Z and MMSS formats:
hrprint = hour(sastime);
put hrprint z2. ':' sastime mmss5.;

INFILE OPTIONS

The INFILE statement in SAS has several useful options to handle different types of external files: 1. FLOWOVER: This is the default option. If the INPUT statement reaches the end of a non-blank line without filling all variables, FLOWOVER reads the next line into the Input Buffer and attempts to fill the remaining variables starting from column one. 2. MISSOVER: Prevents SAS from moving to the next line if a line is shorter than expected. Instead, it assigns missing values to the remaining variables. 3. TRUNCOVER: This option is used to handle variable-length records. If the data line ends in the middle of a variable field, TRUNCOVER takes as much as is available, whereas MISSOVER would assign the variable a missing value. Example:
data trun;
   infile 'external_file.txt' truncover;
   input chr $3.;
run;
This would read up to 3 characters into the variable `chr`, even if the line is shorter than expected.

Different Ways to Use the PUT Statement
The PUT statement is versatile in SAS, used for writing data to the log, external files, or controlling the format of output data.

%INCLUDE Statement:
The %INCLUDE statement is used to include the contents of an external file or another SAS program into the current program. This is particularly useful for including common code snippets or reusable macro definitions across multiple programs.