Chilkat HOME ASP Visual Basic VB.NET C# Visual C++ C MFC Delphi FoxPro Java Perl PHP Python Ruby SQL Server VBScript
HTML Text Formatting TagsHTML text formatting tags (b, font, i, u, br, center, em, strong, big, tt, s, small, strike, sub, and sup) are dropped by default when converting HTML to XML. The formatting tags can be kept by calling UndropTextFormattingTags.
# file: TextFormattingTags.py
import chilkat
# Demonstrates how HTML text formatting tags are (by default) dropped
# during the HTML to XML conversion process. To keep text formatting
# tags, call UndropTextFormattingTags
htmlConv = chilkat.CkHtmlToXml()
success = htmlConv.UnlockComponent("anything for 30-day trial")
if not success:
print "component is locked!"
sys.exit(0)
html = "<html><body>This <b>is</b> a <i>test</i></body></html>"
# To convert, set the HTML and get the XML:
htmlConv.put_Html(html)
xml = htmlConv.xml()
print xml
# The output is this:
#
# <?xml version="1.0" encoding="utf-8" ?>
#
# <root>
# <html>
# <body>
# <text>This is a test</text>
# </body>
# </html>
# </root>
#
#
# What happened to the <b> and <i> tags???
# By default, text formatting tags are dropped.
# If we call UndropTextFormattingTags, the tags will remain:
htmlConv.UndropTextFormattingTags()
xml = htmlConv.xml()
print xml
# We now get this:
#
# <?xml version="1.0" encoding="utf-8" ?>
#
# <root>
# <html>
# <body>
# <text>This </text>
# <b>
# <text>is</text>
# </b>
# <text>a </text>
# <i>
# <text>test</text>
# </i>
# </body>
# </html>
# </root>
|
Need a specific example? Send a request to support@chilkatsoft.com
© 2000-2007 Chilkat Software, Inc. All Rights Reserved.