Java
Java
GetBaseDomain
See more Spider Examples
The GetBaseDomain method is a utility function that converts a domain into a "domain base", which is useful for grouping URLs. For example: abc.chilkatsoft.com, xyz.chilkatsoft.com, and blog.chilkatsoft.com all have the same base domain: chilkatsoft.com. Things get more complicated when considering country domains (.au, .uk, .se, .cn, etc.) and government, state, and .us domains. Also, domains such as blogspot, wordpress, etc, are treated specially so that "xyz.blogspot.com" has a base domain of "xyz.blogspot.com". Note: If you find other domains that should be treated similarly to blogspot.com, send a request to support@chilkatsoft.com.Chilkat Java Downloads
import com.chilkatsoft.*;
public class ChilkatExample {
static {
try {
System.loadLibrary("chilkat");
} catch (UnsatisfiedLinkError e) {
System.err.println("Native code library failed to load.\n" + e);
System.exit(1);
}
}
public static void main(String argv[])
{
CkSpider spider = new CkSpider();
System.out.println(spider.getBaseDomain("www.chilkatsoft.com"));
System.out.println(spider.getBaseDomain("blog.chilkatsoft.com"));
System.out.println(spider.getBaseDomain("www.news.com.au"));
System.out.println(spider.getBaseDomain("blogs.bbc.co.uk"));
System.out.println(spider.getBaseDomain("xyz.blogspot.com"));
System.out.println(spider.getBaseDomain("www.heaids.org.za"));
System.out.println(spider.getBaseDomain("www.hec.gov.pk"));
System.out.println(spider.getBaseDomain("www.e-mrs.org"));
System.out.println(spider.getBaseDomain("cra.curtin.edu.au"));
// Prints:
// chilkatsoft.com
// chilkatsoft.com
// news.com.au
// bbc.co.uk
// xyz.blogspot.com
// heaids.org.za
// hec.gov.pk
// e-mrs.org
// curtin.edu.a
}
}