Chilkat HOME Android™ ASP Visual Basic VB.NET C# iOS (IPhone) Objective-C C++ C Unicode C++ Unicode C MFC Delphi DLL Delphi ActiveX FoxPro Java Perl PHP Extension PHP ActiveX Python PowerShell Ruby SQL Server VBScript
(Visual Basic) Avoiding Outbound Links Matching PatternsThe spider accumulates outbound links when crawling. Your program may specify any number of "avoid patterns" to prevent any link matching at least one of the wildcarded patterns from being added.
' The Chilkat Spider component/library is free. Dim spider As New Spider ' -------------------------------------------------------------------- ' Note: The URLs in this example are no longer valid. ' You should replace the URLs with URLs from a site of your ' own choosing -- preferably your own site if testing. ' (Google's Directory no longer exists.) ' -------------------------------------------------------------------- ' First, we'll get the outbound links for a page in the ' Google directory. Then we'll add some avoid patterns ' and then re-fetch, to see it work... spider.Initialize "directory.google.com" spider.AddUnspidered "http://directory.google.com/Top/Recreation/Food/Cheese/" Dim success As Long success = spider.CrawlNext() ' Display the outbound links Dim i As Long Dim url As String For i = 0 To spider.NumOutboundLinks - 1 Text1.Text = Text1.Text & spider.GetOutboundLink(i) & vbCrLf Next ' The output: ' http://www.cheese.com/ ' http://www.cheesediaries.com/ ' http://www.WisDairy.com/ ' http://www.newenglandcheese.com ' http://www.ilovecheese.com ' http://www.cheesefromspain.com ' http://www.realcaliforniacheese.com/ ' http://www.frencheese.co.uk/ ' http://www.cheesesociety.org/ ' http://www.specialcheese.com/queso.htm ' http://www.franceway.com/cheese/intro.htm ' http://www.foodsubs.com/Chesfirm.html ' http://www.cheeseboard.co.uk/ ' http://www.thecheeseweb.com/ ' http://www.vtcheese.com/ ' http://www.coldbacon.com/cheese.html ' http://www.norwegiancheeses.co.uk/ ' http://www.reluctantgourmet.com/cheese.htm ' http://www.lancewood.co.za/ ' http://www.switzerlandcheese.ca ' http://www.frenchcheese.dk/ ' http://www.dolcevita.com/cuisine/cheese/cheese.htm ' http://cheeseisland.net/ ' http://www.cheestrings.ca/ ' http://www.dreamcheese.co.uk ' http://hgic.clemson.edu/factsheets/HGIC3506.htm ' http://www.epicurious.com/cooking/how_to/food_dictionary/entry?id=1815 ' http://www.mousetrapcheese.co.uk ' http://taquitos.net/yum/gc.shtml ' http://www.greek-recipe.com/static/greek-cheese ' http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html ' http://www.dairyfarmers.org/engl/recipes/4_1.asp ' http://www.prairieridgecheese.com/wischeesguid.html ' http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Food/Cheese ' http://dmoz.org/about.html ' http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Food/Cheese ' Do it again, but this time with avoid patterns. spider.Initialize "directory.google.com" spider.AddUnspidered "http://directory.google.com/Top/Recreation/Food/Cheese/" ' Add some avoid patterns: spider.AddAvoidOutboundLinkPattern "*dmoz.org*" spider.AddAvoidOutboundLinkPattern "*?id=*" spider.AddAvoidOutboundLinkPattern "*.co.uk*" success = spider.CrawlNext() Text1.Text = Text1.Text & "-----------------------" & vbCrLf ' Display the outbound links For i = 0 To spider.NumOutboundLinks - 1 Text1.Text = Text1.Text & spider.GetOutboundLink(i) & vbCrLf Next ' Output: ' http://www.cheese.com/ ' http://www.cheesediaries.com/ ' http://www.WisDairy.com/ ' http://www.newenglandcheese.com ' http://www.ilovecheese.com ' http://www.cheesefromspain.com ' http://www.realcaliforniacheese.com/ ' http://www.cheesesociety.org/ ' http://www.specialcheese.com/queso.htm ' http://www.franceway.com/cheese/intro.htm ' http://www.foodsubs.com/Chesfirm.html ' http://www.thecheeseweb.com/ ' http://www.vtcheese.com/ ' http://www.coldbacon.com/cheese.html ' http://www.reluctantgourmet.com/cheese.htm ' http://www.lancewood.co.za/ ' http://www.switzerlandcheese.ca ' http://www.frenchcheese.dk/ ' http://www.dolcevita.com/cuisine/cheese/cheese.htm ' http://cheeseisland.net/ ' http://www.cheestrings.ca/ ' http://hgic.clemson.edu/factsheets/HGIC3506.htm ' http://taquitos.net/yum/gc.shtml ' http://www.greek-recipe.com/static/greek-cheese ' http://www.park.org/Netherlands/pavilions/food_and_markets/cheese/introduction.html ' http://www.dairyfarmers.org/engl/recipes/4_1.asp ' http://www.prairieridgecheese.com/wischeesguid.html |
© 2000-2013 Chilkat Software, Inc. All Rights Reserved.