Programming Examples

ChilkatHOMEASPVisual BasicVB.NETC#CC++MFCDelphiFoxProJavaPerlPythonRubySQL ServerVBScript

PHP Examples

Bounced Mail
Bz2
Certificates/Keys
Charset
CSV
DKIM / DomainKey
Diffie-Hellman
DSA
Email Object
Encryption
FileAccess
FTP
HTML-to-XML
HTTP
IMAP
MHT / HTML Email
MIME
NTLM
POP3
RSA
SMTP
Socket
Spider
SSH Key
SSH
SSH Tunnel
SFTP
Tar
Upload
XML
Zip


 

 

 

 

 

 

 

 

Avoid URLs Matching Any of a Set of Patterns

Demonstrates how to use "avoid patterns" to prevent spidering any URL that matches a wildcarded pattern. This example avoids URLs containing the substrings "java", "python", or "perl".

Download Chilkat Spider ActiveX

<?php

//  The Chilkat Spider component/library is free.
$spider = new COM("Chilkat.Spider");

//  The spider object crawls a single web site at a time.  As you'll see
//  in later examples, you can collect outbound links and use them to
//  crawl the web.  For now, we'll simply spider 10 pages of chilkatsoft.com
$spider->Initialize('www.chilkatsoft.com');

//  Add the 1st URL:
$spider->AddUnspidered('http://www.chilkatsoft.com/');

//  Avoid URLs matching these patterns:
$spider->AddAvoidPattern('*java*');
$spider->AddAvoidPattern('*python*');
$spider->AddAvoidPattern('*perl*');

//  Begin crawling the site by calling CrawlNext repeatedly.

for ($i = 0; $i <= 9; $i++) {

    $success = $spider->CrawlNext();
    if ($success == true) {
        //  Show the URL of the page just spidered.
        print $spider->lastUrl() . "\n";
        //  The HTML is available in the LastHtml property
    }
    else {
        //  Did we get an error or are there no more URLs to crawl?
        if ($spider->NumUnspidered == 0) {
            print 'No more URLs to spider' . "\n";
        }
        else {
            print $spider->lastErrorText() . "\n";
        }

    }

    //  Sleep 1 second before spidering the next URL.
    $spider->SleepMs(1000);
}


?>

Need a specific example? Send a request to support@chilkatsoft.com

© 2000-2010 Chilkat Software, Inc. All Rights Reserved.