Perl Examples

ChilkatHOMEASPVisual BasicVB.NETC#Visual C++CMFCDelphiFoxProJavaPerlPHPPythonRubySQL ServerVBScript

Perl Examples

Quick Start
Perl Unicode
Perl Byte Array
Perl Certs
Perl Email
Perl Encryption
Perl FTP
HTML-to-XML
Perl HTTP
Perl IMAP
Perl MHT
Perl MIME
Perl RSA
Perl S/MIME
Perl Signatures
Perl Socket
Perl Spider
Perl Tar
Perl Upload
Perl XML
Perl XMP
Perl Zip

More Examples...
String
Email Object
POP3
SMTP
RSS
Atom
Self-Extractor

Unreleased...
Service
PPMD
Deflate
Bzip2
LZW
Bz2
DH Key Exchange
DSA
Icon

 

 

 

 

 

 

 

Avoiding Outbound Links Matching Patterns

The spider accumulates outbound links when crawling. Your program may specify any number of "avoid patterns" to prevent any link matching at least one of the wildcarded patterns from being added.

Download Chilkat Perl Module

use chilkat;

#  The Chilkat Spider component/library is free.
$spider = new chilkat::CkSpider();

#  First, we'll get the outbound links for a page in the
#  Google directory.  Then we'll add some avoid patterns
#  and then re-fetch, to see it work...

$spider->Initialize("directory.google.com");
$spider->AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

$success = $spider->CrawlNext();

#  Display the outbound links

for ($i = 0; $i <= $spider->get_NumOutboundLinks() - 1; $i++) {
    print $spider->getOutboundLink($i) . "\r\n";
}

#  The output:
#  http://www.cheese.com/
#  http://www.cheesediaries.com/
#  http://www.WisDairy.com/
#  http://www.newenglandcheese.com
#  http://www.ilovecheese.com
#  http://www.cheesefromspain.com
#  http://www.realcaliforniacheese.com/
#  http://www.frencheese.co.uk/
#  http://www.cheesesociety.org/
#  http://www.specialcheese.com/queso.htm
#  http://www.franceway.com/cheese/intro.htm
#  http://www.foodsubs.com/Chesfirm.html
#  http://www.cheeseboard.co.uk/
#  http://www.thecheeseweb.com/
#  http://www.vtcheese.com/
#  http://www.coldbacon.com/cheese.html
#  http://www.norwegiancheeses.co.uk/
#  http://www.reluctantgourmet.com/cheese.htm
#  http://www.lancewood.co.za/
#  http://www.switzerlandcheese.ca
#  http://www.frenchcheese.dk/
#  http://www.dolcevita.com/cuisine/cheese/cheese.htm
#  http://cheeseisland.net/
#  http://www.cheestrings.ca/
#  http://www.dreamcheese.co.uk
#  http://hgic.clemson.edu/factsheets/HGIC3506.htm
#  http://www.epicurious.com/cooking/how_to/food_dictionary/entry?id=1815
#  http://www.mousetrapcheese.co.uk
#  http://taquitos.net/yum/gc.shtml
#  http://www.greek-recipe.com/static/greek-cheese
#  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
#  http://www.dairyfarmers.org/engl/recipes/4_1.asp
#  http://www.prairieridgecheese.com/wischeesguid.html
#  http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Food/Cheese
#  http://dmoz.org/about.html
#  http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Food/Cheese

#  Do it again, but this time with avoid patterns.
$spider->Initialize("directory.google.com");
$spider->AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

#  Add some avoid patterns:
$spider->AddAvoidOutboundLinkPattern("*dmoz.org*");
$spider->AddAvoidOutboundLinkPattern("*?id=*");
$spider->AddAvoidOutboundLinkPattern("*.co.uk*");
$success = $spider->CrawlNext();

print "-----------------------" . "\r\n";

#  Display the outbound links
for ($i = 0; $i <= $spider->get_NumOutboundLinks() - 1; $i++) {
    print $spider->getOutboundLink($i) . "\r\n";
}

#  Output:
#  http://www.cheese.com/
#  http://www.cheesediaries.com/
#  http://www.WisDairy.com/
#  http://www.newenglandcheese.com
#  http://www.ilovecheese.com
#  http://www.cheesefromspain.com
#  http://www.realcaliforniacheese.com/
#  http://www.cheesesociety.org/
#  http://www.specialcheese.com/queso.htm
#  http://www.franceway.com/cheese/intro.htm
#  http://www.foodsubs.com/Chesfirm.html
#  http://www.thecheeseweb.com/
#  http://www.vtcheese.com/
#  http://www.coldbacon.com/cheese.html
#  http://www.reluctantgourmet.com/cheese.htm
#  http://www.lancewood.co.za/
#  http://www.switzerlandcheese.ca
#  http://www.frenchcheese.dk/
#  http://www.dolcevita.com/cuisine/cheese/cheese.htm
#  http://cheeseisland.net/
#  http://www.cheestrings.ca/
#  http://hgic.clemson.edu/factsheets/HGIC3506.htm
#  http://taquitos.net/yum/gc.shtml
#  http://www.greek-recipe.com/static/greek-cheese
#  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
#  http://www.dairyfarmers.org/engl/recipes/4_1.asp
#  http://www.prairieridgecheese.com/wischeesguid.html

 

Need a specific example? Send a request to support@chilkatsoft.com

© 2000-2007 Chilkat Software, Inc. All Rights Reserved.