## Thursday, February 4, 2010

### How to extract year information from three formats of dates in a dataset? (IMPUTED Dates)

The LINKEDIN SAS Professionals group had a question on How to extract year information from three formats of dates in a dataset? (IMPUTED Dates). There were a number of good suggestions submitted. Here is a summary of the suggestions:

**************************************************************;
*Method1: Using scan or substr with length functions;

data temp;
infile datalines;
input date:\$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;

data temp;
set temp;
if length(date) = 4 then year = input(substr(date,1,4),best32.);
else if length(date) = 7 then year = input(substr(date,4,4),best32.);
else if length(date) = 10 then year = input(substr(date,7,4),best32.);
run;
*(or);
data temp;
set temp;
if length(date) = 4 then year = input(date,best.);
else if length(date) = 7 then year = input(scan(date,2,'-'),best.);
else if length(date) = 10 then year = input(scan(date,3,'-'),best.);
run;
*************************************************************;
*Method2:Starting with Version 8, the SCAN function has the ability to scan from the end backwards;
data temp;
infile datalines ;
length date \$10 ;
input @1 date \$char10. ;
year = input(scan(date,-1),best.) ;
datalines ;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method3:another option would be the reverse function*/
/* first reverse the date so that the year is always in the first four characters*/
/* substr off the first four characters and reverse again */ ;
data temp;
infile datalines;
input date:\$10.;
year =input(reverse(substr(reverse(compress(date)),1,4)),?? 8.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
*Method4: assumed date not an outputted format value. assumed date var is char ;
data temp;
infile datalines;
input date:\$10.;
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;

data temp;
length year \$4;
set temp;
lgth =length(date); /*** possible values 1 (when blank), 4,7,10 ***/
start =lgth-3;
if start <1 then /*** when blank ***/
year =' ';
else year =substr(date,start,4);
drop lgth start;
run;
**************************************************************;
*Method 5: This solution is based on the fact that the last four characters are the year.;
data temp;
infile datalines;
input date:\$10.;
year = input(substr(date,length(date)-3,4),4.);
datalines;
1998
2008
01-1998
01-2008
01-01-1998
01-01-2008
;
run;
**************************************************************;
('’)