![]() |
Chilkat HOME Android™ AutoIt C C# C++ Chilkat2-Python CkPython Classic ASP DataFlex Delphi DLL Go Java Node.js Objective-C PHP Extension Perl PowerBuilder PowerShell PureBasic Ruby SQL Server Swift Tcl Unicode C Unicode C++ VB.NET VBScript Visual Basic 6.0 Visual FoxPro Xojo Plugin
(PowerBuilder) Convert utf-8 Text File to Windows-1252Demonstrates how to convert a text file using the utf-8 byte representation to windows-1252. Note: This example requires Chilkat v11.0.0 or greater.
integer li_rc integer li_Success oleobject loo_Bd integer li_AllOrNone string ls_FromCharset string ls_ToCharset li_Success = 0 // Converts a file containing the following to windows-1252: // <greetings> // <message>Hello, world!</message> // <message>¡Hola, mundo!</message> // <message>Bonjour, le monde!</message> // <message>Hallo, Welt!</message> // <message>Olá, mundo!</message> // <message>Привет, мир!</message> // <message>你好,世界!</message> // <message>こんにちは、世界!</message> // <message>안녕하세요, 세계!</message> // <message>😊🌍</message> // </greetings> // -------------------------------------------------------------------------------------------------------------------------- // Note: // Windows-1252 is an 8-bit single-byte encoding. It can only encode: // // The basic ASCII set (0x00–0x7F). // Latin-1 Supplement (0xA0–0xFF), plus some extra printable characters (like curly quotes, €, etc.). // In total: 256 possible code points, covering most Western European languages but nothing outside of Latin script. // -------------------------------------------------------------------------------------------------------------------------- // Characters in your XML that are representable // // Hello, world! ✅ (ASCII only) // ¡Hola, mundo! ✅ (inverted exclamation mark U+00A1 is in Windows-1252) // Bonjour, le monde! ✅ // Hallo, Welt! ✅ // Olá, mundo! ✅ (U+00E1 á and U+00F3 ó are in Windows-1252) // -------------------------------------------------------------------------------------------------------------------------- // Characters that break conversion // // Russian / Cyrillic: Привет, мир! // → These are Cyrillic characters (U+041F … U+0440). Not representable in Windows-1252. Conversion would require replacement (e.g. with ? or XML character references). // Chinese: 你好,世界! // → CJK ideographs (U+4F60, U+597D, etc.). Not in Windows-1252. // Japanese: こんにちは、世界! // → Hiragana + CJK. Not in Windows-1252. // Korean: 안녕하세요, 세계! // → Hangul syllables. Not in Windows-1252. // Emoji: 😊🌍 // → Unicode Supplementary Multilingual Plane (U+1F60A, U+1F30D). Windows-1252 cannot encode any emoji. loo_Bd = create oleobject li_rc = loo_Bd.ConnectToNewObject("Chilkat.BinData") if li_rc < 0 then destroy loo_Bd MessageBox("Error","Connecting to COM object failed") return end if // Load the utf-8 bytes. li_Success = loo_Bd.LoadFile("qa_data/xml/utf8test.xml") if li_Success = 0 then Write-Debug loo_Bd.LastErrorText destroy loo_Bd return end if // If allOrNone = 1, then the conversion fails and the contents of the BinData // are left unchanged if any char is unconvertable. // If allOrNone = 0, then non-convertable chars are discarded. li_AllOrNone = 0 ls_FromCharset = "utf-8" ls_ToCharset = "windows-1252" li_Success = loo_Bd.CharsetConvert(ls_FromCharset,ls_ToCharset,li_AllOrNone) // The return value will be 0 if any utf-8 chars were discarded because of non-convertability. if li_Success = 0 then Write-Debug "Some utf-8 chars could not be converted to windows-1252" else Write-Debug "All utf-8 chars were converted to windows-1252" end if li_Success = loo_Bd.WriteFile("c:/temp/qa_output/out.xml") // The output file contains the following, where all non-convertable chars were discarded // <greetings> // <message>Hello, world!</message> // <message>¡Hola, mundo!</message> // <message>Bonjour, le monde!</message> // <message>Hallo, Welt!</message> // <message>Olá, mundo!</message> // <message>, !</message> // <message></message> // <message></message> // <message>, !</message> // <message></message> // </greetings> destroy loo_Bd |
© 2000-2025 Chilkat Software, Inc. All Rights Reserved.