Java Examples

ChilkatHOMEAndroid™ASPVisual BasicVB.NETC#iOS (IPhone)Objective-CC++CMFCDelphiFoxProJavaPerl
PHP ExtensionPHP ActiveXPythonPowerShellRubySQL ServerVBScript

Java Examples

Quick Start
Unicode
Bz2
Certificates
CSV
Email
Encryption
FTP
HTML Conversion
HTTP
IMAP
MHT
MIME
POP3
RSA
S/MIME
SFTP
Signatures
SMTP
Socket / SSL
Spider
SSH
SSH Key
SSH Tunnel
Tar
Upload
XML
XMP
Zip

More Examples...
Amazon S3
Email Object
DKIM / DomainKey
NTLM
FileAccess
RSS
Atom
String
Byte Array
Self-Extractor
Service
PPMD
Deflate
DH Key Exchange
DSA
Bzip2
LZW

 

 

 

 

 

 

 

Avoiding Outbound Links Matching Patterns

The spider accumulates outbound links when crawling. Your program may specify any number of "avoid patterns" to prevent any link matching at least one of the wildcarded patterns from being added.

 Chilkat Java Library Downloads for Windows, Linux, and MAC OS X

import com.chilkatsoft.*;

public class ChilkatExample {

  static {
    try {
        System.loadLibrary("chilkat");
    } catch (UnsatisfiedLinkError e) {
      System.err.println("Native code library failed to load.\n" + e);
      System.exit(1);
    }
  }

  public static void main(String argv[])
  {
    //  The Chilkat Spider component/library is free.
    CkSpider spider = new CkSpider();

    //  First, we'll get the outbound links for a page in the
    //  Google directory.  Then we'll add some avoid patterns
    //  and then re-fetch, to see it work...

    spider.Initialize("directory.google.com");
    spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

    boolean success;
    success = spider.CrawlNext();

    //  Display the outbound links
    int i;
    String url;
    for (i = 0; i <= spider.get_NumOutboundLinks() - 1; i++) {
        System.out.println(spider.getOutboundLink(i));
    }

    //  The output:
    //  http://www.cheese.com/
    //  http://www.cheesediaries.com/
    //  http://www.WisDairy.com/
    //  http://www.newenglandcheese.com
    //  http://www.ilovecheese.com
    //  http://www.cheesefromspain.com
    //  http://www.realcaliforniacheese.com/
    //  http://www.frencheese.co.uk/
    //  http://www.cheesesociety.org/
    //  http://www.specialcheese.com/queso.htm
    //  http://www.franceway.com/cheese/intro.htm
    //  http://www.foodsubs.com/Chesfirm.html
    //  http://www.cheeseboard.co.uk/
    //  http://www.thecheeseweb.com/
    //  http://www.vtcheese.com/
    //  http://www.coldbacon.com/cheese.html
    //  http://www.norwegiancheeses.co.uk/
    //  http://www.reluctantgourmet.com/cheese.htm
    //  http://www.lancewood.co.za/
    //  http://www.switzerlandcheese.ca
    //  http://www.frenchcheese.dk/
    //  http://www.dolcevita.com/cuisine/cheese/cheese.htm
    //  http://cheeseisland.net/
    //  http://www.cheestrings.ca/
    //  http://www.dreamcheese.co.uk
    //  http://hgic.clemson.edu/factsheets/HGIC3506.htm
    //  http://www.epicurious.com/cooking/how_to/food_dictionary/entry?id=1815
    //  http://www.mousetrapcheese.co.uk
    //  http://taquitos.net/yum/gc.shtml
    //  http://www.greek-recipe.com/static/greek-cheese
    //  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
    //  http://www.dairyfarmers.org/engl/recipes/4_1.asp
    //  http://www.prairieridgecheese.com/wischeesguid.html
    //  http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Food/Cheese
    //  http://dmoz.org/about.html
    //  http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Food/Cheese

    //  Do it again, but this time with avoid patterns.
    spider.Initialize("directory.google.com");
    spider.AddUnspidered("http://directory.google.com/Top/Recreation/Food/Cheese/");

    //  Add some avoid patterns:
    spider.AddAvoidOutboundLinkPattern("*dmoz.org*");
    spider.AddAvoidOutboundLinkPattern("*?id=*");
    spider.AddAvoidOutboundLinkPattern("*.co.uk*");
    success = spider.CrawlNext();

    System.out.println("-----------------------");

    //  Display the outbound links
    for (i = 0; i <= spider.get_NumOutboundLinks() - 1; i++) {
        System.out.println(spider.getOutboundLink(i));
    }

    //  Output:
    //  http://www.cheese.com/
    //  http://www.cheesediaries.com/
    //  http://www.WisDairy.com/
    //  http://www.newenglandcheese.com
    //  http://www.ilovecheese.com
    //  http://www.cheesefromspain.com
    //  http://www.realcaliforniacheese.com/
    //  http://www.cheesesociety.org/
    //  http://www.specialcheese.com/queso.htm
    //  http://www.franceway.com/cheese/intro.htm
    //  http://www.foodsubs.com/Chesfirm.html
    //  http://www.thecheeseweb.com/
    //  http://www.vtcheese.com/
    //  http://www.coldbacon.com/cheese.html
    //  http://www.reluctantgourmet.com/cheese.htm
    //  http://www.lancewood.co.za/
    //  http://www.switzerlandcheese.ca
    //  http://www.frenchcheese.dk/
    //  http://www.dolcevita.com/cuisine/cheese/cheese.htm
    //  http://cheeseisland.net/
    //  http://www.cheestrings.ca/
    //  http://hgic.clemson.edu/factsheets/HGIC3506.htm
    //  http://taquitos.net/yum/gc.shtml
    //  http://www.greek-recipe.com/static/greek-cheese
    //  http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html
    //  http://www.dairyfarmers.org/engl/recipes/4_1.asp
    //  http://www.prairieridgecheese.com/wischeesguid.html

  }
}

 

© 2000-2010 Chilkat Software, Inc. All Rights Reserved.