ENCODING=Dataset Option
Let me explain the reason writing this post….
When he asked me why SAS stops in the middle, we were quick in taking the help of GOOGLE because we never saw this kind of ERROR message in the log. Unfortunately, the GOOGLE showed us so many links which has all the technical details. There were few options we saw in those links and nothing worked. So after so many trials, we stumbled upon a way or we can say the solution, using ASCIIANY as the encoding option in the LIBNAME statement.
run;
If you refer to the SAS reference Guide, you will see this, which explains how this option works….
The value for ENCODING= indicates that the SAS data set has a different encoding
from the current session encoding. When you read data from a data set, SAS
transcodes the data from the specified encoding to the session encoding. When
you write data to a data set, SAS transcodes the data from the session encoding
to the specified encoding.
My coworker was having problem reading in a SAS dataset
that he got from the Sponsor. It was a SAS dataset encoded with UTF-8 and other
coding related stuff.
When he tried to get in the rawdata using Libname statement
libname rawdata ‘/sas/SAS913/SASDATA/CLIENT /ABC123/raw’;
data datasetname;
set rawdata.datasetname;
run;
When he runs the SAS
code above, SAS stops at the current block, and returns an error that looks
like this:
ERROR:
Some character data was lost during transcoding in the dataset RAWDATA.DATSETNAME.
NOTE:
The data step has been abnormally terminated.
NOTE:
The SAS System stopped processing this step because of errors.
NOTE:
SAS set option OBS=0 and will continue to check statements. This may cause
NOTE: No observations in data set.
NOTE:
There were 20314 observations read from the data set RAWDATA.DATSETNAME.
WARNING:
The data set WORK.DATASETNAME may be incomplete. When this step was stopped there were 20314
observations and
67 variables.
NOTE:
DATA statement used (Total process time):
real time 0.53 seconds
cpu time 0.46 seconds
When he asked me why SAS stops in the middle, we were quick in taking the help of GOOGLE because we never saw this kind of ERROR message in the log. Unfortunately, the GOOGLE showed us so many links which has all the technical details. There were few options we saw in those links and nothing worked. So after so many trials, we stumbled upon a way or we can say the solution, using ASCIIANY as the encoding option in the LIBNAME statement.
libname rawdata ‘/sas/SAS913/SASDATA/CLIENT /ABC123/raw’
inencoding=asciiany;
If you
have only one dataset to use or you know the name of the dataset which has the
encoding problem you could use the simple datastep too. Here is how…
data datasetname;set rawdata.datasetname (encoding='asciiany');run;
If you refer to the SAS reference Guide, you will see this, which explains how this option works….
ENCODING= ANY | ASCIIANY |
EBCDICANY | encoding-value
|
ANY
specifies
that no transcoding occurs.
Note: ANY is a synonym for binary. Because the data is binary,
the actual encoding is irrelevant.
ASCIIANY
specifies
that no transcoding occurs when the mixed encodings are ASCII encodings.
Transcoding
normally occurs when SAS detects that the session encoding and data set
encoding are different. ASCIIANY enables you to create a data set that SAS will
not transcode if the SAS session that accesses the data set has a session that
encoding value of ASCII. If you transfer the data set to a machine that uses
EBCDIC encoding, transcoding occurs.
EBCDICANY
specifies
that no transcoding occurs when the mixed encodings are EBCDIC encodings.
For more details refer to the documentation….