Author Login
Post Reply
Is there a way for me to prevent nutch from fetching outlinks from pages
that I decide to be irrelevant (where I make the decision that a page is
irrelevant during the parsing of that page with my parse filter)? I realize
that I can stop nutch from indexing such pages, but I believe the index is
separate from the structure that determines what new pages should be
fetched.
Best,
John