Discover More Tips and Techniques on This Blog

Showing posts with label ENCODING=option. Show all posts
Showing posts with label ENCODING=option. Show all posts

Transcoding Problem: Option (correctencoding=wlatin1)

Have you ever tried to convert the default encoding to Wlatin1 (Windows SAS Session Encoding)?

Let me tell you the story behind writing this post….

Today I was asked to send SAS datasets to one of the client. I transferred the SAS datasets to the client and immediately after, I got an email from the so called client saying the encoding of SAS datasets is different this time when compared with the last transfer. He said It’s causing problems in Proc compare process.

Opps… bummer…. Client’s email got me little worried ...
I checked the Proc contents details and saw the change in the encoding. I investigated the issue and found out that Unicode SAS with UTF-8 encoding uses 1 to 4 bytes to handle Unicode data. It is possible for the 8-bit characters to be expanded by 2 to 3 bytes when transcoding occurs, which causes the truncation error.

Because of the truncation problem I was asked to change the unicoding back to WLATIN1 so that the character data present in the SAS datasets represents the US and Europe characters in windows.

Here is the code to do that.

proc datasets lib=SDTM;
modify supplb/correctencoding=wlatin1;
run;

Unicode Basics:

Unicode is the universal character encoding that supports the interchange, processing, and display of
characters and symbols found in the world’s writing systems. Other character encodings are limited to
subsets of all languages. 

Transcoding problem:

Currently, SAS/ACCESS Interface to ODBC cannot support Unicode. DBCS data and single-byte
non-ASCII characters cannot be correctly processed.

In a SAS UTF-8 session, data imported appear as question marks. Although the encoding property of
imported data is UTF-8, the real encoding of the data is not changed to UTF-8.

Workaround:

/* Set the correct encoding to ‘WLATIN1’ */
proc datasets lib=stagings;
modify custtype_w1/correctencoding=wlatin1;
run;

/* Transcode data from ‘Wlatin1’ to ‘UTF-8’ */
data stagings.custtype_u8;
set stagings.custtype_w1;
run;


Ref:







Disclosure:

In the spirit of transparency and innovation, I want to share that some of the content on this blog is generated with the assistance of ChatGPT, an AI language model developed by OpenAI. While I use this tool to help brainstorm ideas and draft content, every post is carefully reviewed, edited, and personalized by me to ensure it aligns with my voice, values, and the needs of my readers. My goal is to provide you with accurate, valuable, and engaging content, and I believe that using AI as a creative aid helps achieve that. If you have any questions or feedback about this approach, feel free to reach out. Your trust and satisfaction are my top priorities.