Visual Basic Examples

ChilkatHOMEAndroid™Classic ASPCC++C#Delphi ActiveXDelphi DLLVisual FoxProJavaMFCObjective-CPerlPHP ActiveXPHP ExtensionPowerShellPythonRubySQL ServerUnicode CUnicode C++Visual Basic 6.0VB.NETVBScript



VB Examples

Bounced Mail
Bz2
Character Encoding
CSV
Digital Certificates
Digital Signatures
Email
FTP
HTML Conversion
HTTP
IMAP
Encryption
MHT / HTML Email
POP3
RSA
S/MIME
SFTP
SMTP
Socket
Spider
SSH
SSH Key
SSH Tunnel
String
Tar
Upload
XML
XMP
Zip Compression

More Examples...
Amazon S3
Email Object
DKIM / DomainKey
NTLM
DH Key Exchange
DSA
FileAccess
RSS
Atom
Self-Extractor
Service
Bzip2
PPMD
Deflate
LZW


VB Strings
VB Byte Array

 

 

 

 

 

 

 

Display Unicode Strings in Visual Basic 6.0

Displaying Unicode strings in VB6 is seemingly impossible, but it's not. A common problem is: "My strings are displayed incorrectly, with question mark characters where non us-ascii characters should be displayed. What's going on???"

Working with VB6 can be maddening when it comes to working with Unicode strings and displaying characters in different languages. The first step to enlightenment (and an end to hair-pulling) is to understand three things:

  1. Internally, VB6 stores strings as Unicode.
  2. When displaying a string, the standard VB6 textbox and label controls do an implicit (and internal) conversion from Unicode to ANSI.
  3. The standard VB6 textbox and label controls display the ANSI bytes according to a character encoding that you can specify.

I'll explain each point in more detail:

First, make sure you understand what "ANSI" means. ANSI is not an actual charset name. It's simply a way of saying: "Use the default charset for this computer". If your program is running on a computer in France, ANSI is probably "windows-1252", as is the case for other Western European countries, USA, Australia, etc. If you're working on a computer in Japan, ANSI is probably "Shift_JIS". For the Czech Republic: "windows-1250". And on and on and on...

Internally, VB6 stores strings as Unicode. Your VB6 program is capable of manipulating strings in any language containing any character -- whether it's Chinese, Japanese, Icelandic, Arabic, etc. It's fully Unicode capable. A single string may contain characters in multiple languages. You can save these strings to databases, files, etc., and there shouldn't be a problem. Problems arise only when trying to display (i.e. render the glyphs) for foreign characters in the standard VB6 controls.

When displaying a string, the standard VB6 textbox and label controls do an implicit (and internal) conversion from Unicode to ANSI. This is the confounding behavior that causes all the trouble. Internal to VB6, the runtime is converting Unicode to the current Windows ANSI code page identifier for the operating system. There is no way to change this conversion short of changing the ANSI code page for the system.

The standard VB6 textbox and label controls display the ANSI bytes according to a character encoding that you can specify. After the Unicode-to-ANSI conversion, VB6 then attempts to display the character data according to the control's Font.Charset property, which if left unchanged is equal to the ANSI charset. Changing the control's Font.Charset changes the way VB6 interprets the "ANSI" bytes. In other words, you're telling VB6 to treat the bytes as some other character encoding instead of "ANSI". Note: VB6 is capable of displaying characters in all the major languages. It simply needs to be told to do so, and the correct bytes need to be in place internally for it to happen.

Given the above explanation, it is easy to see how VB6 works fine when displaying Japanese on Japanese computers, displaying Hebrew on Hebrew computers, etc. In those cases, the internal Unicode-to-ANSI conversion doesn't ruin the text rendering process that follows. The problems arise when trying to display Japanese on an USA computer, or Hebrew on a Greek computer, etc. As an example, consider trying to display a Unicode Japanese string on an English computer: You set the Font.Charset = 128 (for Japanese), but your Unicode string displays as all question mark characters. It's because VB6 is first tyring to convert your Japanese Unicode string to ANSI, which is Windows-1252 for English computers. Japanese characters are not representable in Windows-1252. Each character fails to convert and is replaced with a question mark.

So how do you do it? How can you do it so that your Japanese string displays correctly on any computer in any country? It's possible and I'll show you how.

But first, let me demonstrate that what I've said so far is correct.

Consider this simple example, which can be downloaded at: http://www.example-code.com/downloads/vb6UnicodeExample1.zip

    Dim s1 As String
    s1 = "ƒp"
    
    ' In the VB6 IDE, the Font for the Text1 textbox is set to
    ' MS UI Gothic w/ the Japanese script selected.
    ' (Selecting the Japanese script in the TextBox's property
    ' settings is the same as setting the Font.Charset at runtime.)
    ' It displays a single Japanese character: パ
    Text1.Text = s1
    
    ' The Font for Text2 is set to Arial w/ the
    ' Western script selected.
    ' It displays the two characters as you see them
    ' in the literal string above: ƒp
    Text2.Text = s1

How could it be that "ƒp" displays a single Japanese character? Which Japanese character is displayed and why?

Let's look at "ƒp" in Unicode. After all, that's how VB stores strings internally in memory. "ƒ" is the LATIN SMALL LETTER F WITH HOOK. Its 2-byte Unicode value is 0x0192. "p" is the "LATIN SMALL LETTER P" and it's Unicode value is 0x0070.

The first thing a standard VB6 control will do when displaying a string is convert the Unicode to ANSI -- and you have no control over this. In this case, on Western European and USA computers, 0x0192 0x0070 is converted to 0x83 0x70. (You can refer to the Windows-1252 code page here: http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx Look for the character at 0x83 and you'll see our "ƒ" and that the Unicode value is 0x0192.)

So... the bytes VB6 will display are: 0x83 0x70. The textbox control displays them according to its Font.Charset. Text2's Font.Charset hasn't been changed, and since we're on a computer in the USA it renders just as we expect. (Note: make sure the Font you select is capable of rendering glyphs. As an example, "MS Sans Serif" font does not render "ƒ", so you'll see a thin solid rectangular box in its place.)

The Text1 textbox is more interesting. Its Font.Charset has effectively been set to 128 (Japanese Shift_JIS) by setting the script to Japanese in the font properties dialog. This means that VB6 will interpret 0x83 0x70 according to Shift_JIS. If we examine the characters for that code page at http://www.microsoft.com/globaldev/reference/dbcs/932/932_83.mspx you will find this:

8370 = U+30D1 : KATAKANA LETTER PA

You would expect this single Japanese character to be displayed, and that's exactly what happens. You see this: パ

This example is continued at: How to display Japanese string in Visual Basic 6.0 correctly on any computer in any country.

© 2000-2013 Chilkat Software, Inc. All Rights Reserved.