Discover More Tips and Techniques on This Blog

How to Create a new data set from multiple data sets based upon sorted order

Create a new data set from multiple data sets based upon sorted order

Use the SET and BY statements to interleave data sets.

Note: Interleaving uses a SET statement and a BY statement to create a new, sorted data set from multiple data sets. The number of observations in the new data set is the sum of the number of observations in the original data sets. The observations in the new data set are arranged by the values of the BY variable(s) and, within each BY-Group, the order of the data sets in which they occur, including duplicates.

To interleave, data sets need to be in sorted order or indexed on the BY variables.



Output:
source: support.sas.com

Dynamically generate SET statement to combine multiple data sets

Dynamically generate SET statement to combine multiple data sets

You can manually enter the data set names or use MACRO logic to generate the repetitive data set names when combining many data sets on a SET statement.

Note: Variable name lists are not valid for data sets. In other words, if you have WORK.DS1, WORK.DS2, and WORK.DS3, you can NOT specify WORK.DS1-WORK.DS3 on the SET statement.

/* names use the naming convention of DSn, where n is an incrementing /* number.

data ds1;
x=1;
run;

data ds2;
x=2;
run;

data ds3;
x=3;
run;

/* Build a macro called NAMES with two parameters. The first parameter /
* is the 'prefix' of the naming pattern. The second parameter is the /
* maximum number of data sets you want to generate on the SET statement. */


%macro names(prefix,maxnum);
%do i=1 %to &maxnum;
&prefix&i
%end;
;

%mend names;

/* Call the macro on the SET statement */

data all;
set %names(DS,3);
run;

proc print data=all;
title "Appended results";
run;
run;

Appended results

Obs x
1 1
2 2
3 3

source: www.support.sas.com

How to determine which data set contributed an observation

Determine which data set contributed an observation

Use the IN= option to create a boolean variable that is set to one or 'true' to indicate whether the data set contributed data to the current observation. When the IN= variable's value is 1, assign the data set's name into a new variable.

data one;
input string $;datalines;
apple
banana
coconut; run;

data two;
input string $;
datalines;
anagram
bottle
clown
dog; run;

data combo;
set one(in=o) two(in=t);
if o then origin='one';
else origin='two';
run;

proc print data=combo;run;
run;

RESULTS:

Obs string origin
1 apple one
2 banana one
3 coconut one
4 anagram two
5 bottle two
6 clown two
7 dog two

source: www.support.sas.com

Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.