Python Examples

ChilkatHOMEASPVisual BasicVB.NETC#Visual C++CMFCDelphiFoxProJavaPerlPHPPythonRubySQL ServerVBScript

Python Examples

Quick Start
Python Unicode
Python Byte Array
Python Certs
Python Email
Python Encryption
Python FTP
HTML-to-XML
Python HTTP
Python IMAP
Python MHT
Python MIME
Python RSA
Python S/MIME
Python Signatures
Python Socket
Python Spider
Python Tar
Python Upload
Python XML
Python XMP
Python Zip

More Examples...
String
Email Object
POP3
SMTP
RSS
Atom
Self-Extractor
Service
PPMD
Deflate
DH Key Exchange
DSA
SSH Key
SSH
SSH Tunnel
SFTP

Unreleased...
Bzip2
LZW
Bz2
Icon

 

 

 

 

 

 

 

Avoiding Outbound Links Matching Patterns

The spider accumulates outbound links when crawling. Your program may specify any number of "avoid patterns" to prevent any link matching at least one of the wildcarded patterns from being added.

Download Chilkat Python Library

import chilkat

#  The Chilkat Spider component/library is free.
spider = chilkat.CkSpider()

#  First, we'll get the outbound links for a page in the
#  Google directory.  Then we'll add some avoid patterns
#  and then re-fetch, to see it work...

spider.Initialize("directory.google.com")
spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/")

success = spider.CrawlNext()

#  Display the outbound links

for i in range(0,spider.get_NumOutboundLinks()):
    print spider.getOutboundLink(i)

#  The output:
#  http://www.cheese.com/
#  http://www.cheesediaries.com/
#  http://www.WisDairy.com/
#  http://www.newenglandcheese.com
#  http://www.ilovecheese.com
#  http://www.cheesefromspain.com
#  http://www.realcaliforniacheese.com/
#  http://www.frencheese.co.uk/
#  http://www.cheesesociety.org/
#  http://www.specialcheese.com/queso.htm
#  http://www.franceway.com/cheese/intro.htm
#  http://www.foodsubs.com/Chesfirm.html
#  http://www.cheeseboard.co.uk/
#  http://www.thecheeseweb.com/
#  http://www.vtcheese.com/
#  http://www.coldbacon.com/cheese.html
#  http://www.norwegiancheeses.co.uk/
#  http://www.reluctantgourmet.com/cheese.htm
#  http://www.lancewood.co.za/
#  http://www.switzerlandcheese.ca
#  http://www.frenchcheese.dk/
#  http://www.dolcevita.com/cuisine/cheese/cheese.htm
#  http://cheeseisland.net/
#  http://www.cheestrings.ca/
#  http://www.dreamcheese.co.uk
#  http://hgic.clemson.edu/factsheets/HGIC3506.htm
#  http://www.epicurious.com/cooking/how_to/food_dictionary/entry?id=1815
#  http://www.mousetrapcheese.co.uk
#  http://taquitos.net/yum/gc.shtml
#  http://www.greek-recipe.com/static/greek-cheese
#  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
#  http://www.dairyfarmers.org/engl/recipes/4_1.asp
#  http://www.prairieridgecheese.com/wischeesguid.html
#  http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Food/Cheese
#  http://dmoz.org/about.html
#  http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Food/Cheese

#  Do it again, but this time with avoid patterns.
spider.Initialize("directory.google.com")
spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/")

#  Add some avoid patterns:
spider.AddAvoidOutboundLinkPattern("*dmoz.org*")
spider.AddAvoidOutboundLinkPattern("*?id=*")
spider.AddAvoidOutboundLinkPattern("*.co.uk*")
success = spider.CrawlNext()

print "-----------------------"

#  Display the outbound links
for i in range(0,spider.get_NumOutboundLinks()):
    print spider.getOutboundLink(i)

#  Output:
#  http://www.cheese.com/
#  http://www.cheesediaries.com/
#  http://www.WisDairy.com/
#  http://www.newenglandcheese.com
#  http://www.ilovecheese.com
#  http://www.cheesefromspain.com
#  http://www.realcaliforniacheese.com/
#  http://www.cheesesociety.org/
#  http://www.specialcheese.com/queso.htm
#  http://www.franceway.com/cheese/intro.htm
#  http://www.foodsubs.com/Chesfirm.html
#  http://www.thecheeseweb.com/
#  http://www.vtcheese.com/
#  http://www.coldbacon.com/cheese.html
#  http://www.reluctantgourmet.com/cheese.htm
#  http://www.lancewood.co.za/
#  http://www.switzerlandcheese.ca
#  http://www.frenchcheese.dk/
#  http://www.dolcevita.com/cuisine/cheese/cheese.htm
#  http://cheeseisland.net/
#  http://www.cheestrings.ca/
#  http://hgic.clemson.edu/factsheets/HGIC3506.htm
#  http://taquitos.net/yum/gc.shtml
#  http://www.greek-recipe.com/static/greek-cheese
#  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
#  http://www.dairyfarmers.org/engl/recipes/4_1.asp
#  http://www.prairieridgecheese.com/wischeesguid.html

 

Need a specific example? Send a request to support@chilkatsoft.com

© 2000-2008 Chilkat Software, Inc. All Rights Reserved.