Friday, March 12, 2010

PRXMATCH Function

Prxmatch () function is very useful in locating the matching strings. Prxmatch() function has 2 parameters, the first parameter is the regular expression ID (i.e what you are looking in a string for a match) and the second parameter is the character string to be searched. PRXMATCH () function returns the start position of the matching string.
Syntax:

PRXMATCH (perl-regular-expression, source);

Even though PRXMATCH function can be used when....
1) When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
2) If you need to search a character variable for multiple different substrings.

Here is how PRXMATCH works in the Ist case.

*Prxmatch () function is very useful in locating the matching strings;

DATA finda2z;
INPUT ID \$ 1-3 string \$ 5-10;
prxmatch=prxmatch("/[a-zA-Z]/",string);
DATALINES;
001 ACBED
002 11
003 12
004 zx
005 11 2c
006 abc123
;
run;

proc print;
run;

Output:

*Here PRXMATCH function will return the start position of matching string. In this case, a to z or A to Z.

If you want to find out which observation has matching string for a specified variable.

Use the following code.

prxmatch=prxmatch("/[a-zA-Z]/",string)>0;

*If match found, the value returned is 1 or else 0.

To keep those records that do not match this pattern you will look for those records where PRXMATCH returns a zero.

PRXMATCH () function is very helpful If you need to search a character variable for multiple different substrings in a variable.

Problem: Select observations where AETERM has substrings ‘ nausea’, ‘vomiting’ and ‘fever’.

The old method is to combine several INDEX function statements together with OR conditions like as …..

if index(aeterm,'nausea') > 0 or
if index(aeterm,'vomiting') > 0 or
if index(aeterm,'fever') > 0 ;

The PRXMATCH function can do this all in one statement. Less typing….few lines of code.

if prxmatch ("m/nausea|vomiting|fever/i",aeterm) > 0 ;

The 'm' option in perl-regular-expression means, PRXMATCH is going to start a matching operation.
The 'i' option tells SAS not to worry about the case, i.e, consider  "NAUSEA" as same as "nausea" while searching for a match.
Another advantage of using this 'i' modifier is we can make parts of a string case sensitive and insensitive using  ( i:)  or (-i:).
( i:)  turns ON the case insensitive search
(-i:) - turn OFF the case insensitive search

Pipes ‘|’ should be used to separate the search strings.

Please refer PRXMATCH in the Functions section of the SAS Language Reference: Dictionary in the Online SAS Documentation for more information.

PRXMATCH special characters and it's meaning:

\$ - end with
\D - any non digits
\d - digits
? - may or may not have?
| - or
* - repeating
( i:) - turns ON the case insensitive search

(-i:) - turn OFF the case insensitive search