Java

Fetch robots.txt for a Site

The Chilkat Spider library is robots.txt compliant. It automatically fetches a site's robots.txt file and adheres to it. It will not download pages denied by robots.txt. Pages excluded by robots.txt will not appear in the Spider's "unspidered" list. This example shows how to explicitly download and review the robots.txt for a given site.

Chilkat Java Downloads

Download Chilkat for Java

Java

import com.chilkatsoft.*;

public class ChilkatExample {

  static {
    try {
        System.loadLibrary("chilkat");
    } catch (UnsatisfiedLinkError e) {
      System.err.println("Native code library failed to load.\n" + e);
      System.exit(1);
    }
  }

  public static void main(String argv[])
  {
    CkSpider spider = new CkSpider();

    spider.Initialize("www.chilkatsoft.com");

    String robotsText;
    robotsText = spider.fetchRobotsText();

    System.out.println(robotsText);
  }
}