Perl
nodes with the intent to make it easy for programs to identify and extract the text parts.
Perl
Convert HTML to XML with Auto-Correction
See more HTML-to-XML/Text Examples
Simple HTML to XML conversion. Demonstrates how the HTML is auto-corrected to create well-formed XML. In this example, the closing is missing. Also, text is encapsulated inChilkat Perl Downloads
use chilkat();
# This example assumes the Chilkat API to have been previously unlocked.
# See Global Unlock Sample for sample code.
$htmlToXml = chilkat::CkHtmlToXml->new();
# Indicate the charset of the output XML we'll want.
$htmlToXml->put_XmlCharset("utf-8");
# Set the HTML:
$htmlToXml->put_Html("<html><body><p>This is a test <a href=\"http://www.chilkatsoft.com/\">Chilkat Software</a></body></html>");
# Get the XML:
print $htmlToXml->toXml() . "\r\n";
# This is the output:
# <?xml version="1.0" encoding="utf-8" ?>
#
# <root>
# <html>
# <body>
# <p>
# <text>This is a test </text>
# <a href="http://www.chilkatsoft.com/">
# <text>Chilkat Software</text>
# </a>
# </p>
# </body>
# </html>
# </root