MFC Examples

ChilkatHOMEASPVisual BasicVB.NETC#Visual C++CMFCDelphiFoxProJavaPerlPHPPythonRubySQL ServerVBScript

MFC Examples

Bounced Mail
Bz2
Certificates/Keys
Charset
CSV
Diffie-Hellman
DSA
Email Object
Encryption
FileAccess
FTP
HTML-to-XML
HTTP
IMAP
MHT / HTML Email
MIME
POP3
RSA
SMTP
Socket
Spider
SSH Key
SSH
SSH Tunnel
SFTP
Tar
Upload
XML
Zip


 

 

 

 

 

 

 

 

Avoiding Outbound Links Matching Patterns

The spider accumulates outbound links when crawling. Your program may specify any number of "avoid patterns" to prevent any link matching at least one of the wildcarded patterns from being added.

Download Chilkat C/C++ Libraries for VC++ 9.0 / Win32

Download Chilkat C/C++ Libraries for VC++ 8.0 / Win32

Download Chilkat C/C++ 64-bit Libraries for VC++ 8.0 / x64

Download Chilkat Visual Studio 2005 C/C++ Libs for Windows Mobile, Pocket PC, SmartPhone, WinCE

Download Chilkat C/C++ Libraries for VC++ 7.0 / Win32

Download Chilkat C/C++ Libraries for VC++ 6.0 / Win32

Download Chilkat C/C++ Libraries for VC++ 6.0, Win 95/98/NT4 Compatible

// Needs #include <CkSpider.h>

    CkString strOut;

    //  The Chilkat Spider component/library is free.
    CkSpider spider;

    //  First, we'll get the outbound links for a page in the
    //  Google directory.  Then we'll add some avoid patterns
    //  and then re-fetch, to see it work...

    spider.Initialize("directory.google.com");
    spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

    bool success;
    success = spider.CrawlNext();

    //  Display the outbound links
    long i;
    const char * url;
    for (i = 0; i <= spider.get_NumOutboundLinks() - 1; i++) {
        strOut.append(spider.getOutboundLink(i));
        strOut.append("\r\n");
    }

    //  The output:
    //  http://www.cheese.com/
    //  http://www.cheesediaries.com/
    //  http://www.WisDairy.com/
    //  http://www.newenglandcheese.com
    //  http://www.ilovecheese.com
    //  http://www.cheesefromspain.com
    //  http://www.realcaliforniacheese.com/
    //  http://www.frencheese.co.uk/
    //  http://www.cheesesociety.org/
    //  http://www.specialcheese.com/queso.htm
    //  http://www.franceway.com/cheese/intro.htm
    //  http://www.foodsubs.com/Chesfirm.html
    //  http://www.cheeseboard.co.uk/
    //  http://www.thecheeseweb.com/
    //  http://www.vtcheese.com/
    //  http://www.coldbacon.com/cheese.html
    //  http://www.norwegiancheeses.co.uk/
    //  http://www.reluctantgourmet.com/cheese.htm
    //  http://www.lancewood.co.za/
    //  http://www.switzerlandcheese.ca
    //  http://www.frenchcheese.dk/
    //  http://www.dolcevita.com/cuisine/cheese/cheese.htm
    //  http://cheeseisland.net/
    //  http://www.cheestrings.ca/
    //  http://www.dreamcheese.co.uk
    //  http://hgic.clemson.edu/factsheets/HGIC3506.htm
    //  http://www.epicurious.com/cooking/how_to/food_dictionary/entry?id=1815
    //  http://www.mousetrapcheese.co.uk
    //  http://taquitos.net/yum/gc.shtml
    //  http://www.greek-recipe.com/static/greek-cheese
    //  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
    //  http://www.dairyfarmers.org/engl/recipes/4_1.asp
    //  http://www.prairieridgecheese.com/wischeesguid.html
    //  http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Food/Cheese
    //  http://dmoz.org/about.html
    //  http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Food/Cheese

    //  Do it again, but this time with avoid patterns.
    spider.Initialize("directory.google.com");
    spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

    //  Add some avoid patterns:
    spider.AddAvoidOutboundLinkPattern("*dmoz.org*");
    spider.AddAvoidOutboundLinkPattern("*?id=*");
    spider.AddAvoidOutboundLinkPattern("*.co.uk*");
    success = spider.CrawlNext();

    strOut.append("-----------------------");
    strOut.append("\r\n");

    //  Display the outbound links
    for (i = 0; i <= spider.get_NumOutboundLinks() - 1; i++) {
        strOut.append(spider.getOutboundLink(i));
        strOut.append("\r\n");
    }

    //  Output:
    //  http://www.cheese.com/
    //  http://www.cheesediaries.com/
    //  http://www.WisDairy.com/
    //  http://www.newenglandcheese.com
    //  http://www.ilovecheese.com
    //  http://www.cheesefromspain.com
    //  http://www.realcaliforniacheese.com/
    //  http://www.cheesesociety.org/
    //  http://www.specialcheese.com/queso.htm
    //  http://www.franceway.com/cheese/intro.htm
    //  http://www.foodsubs.com/Chesfirm.html
    //  http://www.thecheeseweb.com/
    //  http://www.vtcheese.com/
    //  http://www.coldbacon.com/cheese.html
    //  http://www.reluctantgourmet.com/cheese.htm
    //  http://www.lancewood.co.za/
    //  http://www.switzerlandcheese.ca
    //  http://www.frenchcheese.dk/
    //  http://www.dolcevita.com/cuisine/cheese/cheese.htm
    //  http://cheeseisland.net/
    //  http://www.cheestrings.ca/
    //  http://hgic.clemson.edu/factsheets/HGIC3506.htm
    //  http://taquitos.net/yum/gc.shtml
    //  http://www.greek-recipe.com/static/greek-cheese
    //  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
    //  http://www.dairyfarmers.org/engl/recipes/4_1.asp
    //  http://www.prairieridgecheese.com/wischeesguid.html


    SetDlgItemText(IDC_EDIT1,strOut.getUnicode());

Need a specific example? Send a request to support@chilkatsoft.com

© 2000-2008 Chilkat Software, Inc. All Rights Reserved.