Concept: Harvest Depth

Harvest Depth controls how many pages are harvested by following links to new pages.

Guidelines
  • Harvester tracks the depth level of each page being harvested.

  • Each ruleset has a max harvest depth.

  • When a page is harvested the best matching ruleset is used to harvest the page.

  • When a page is at the max harvest depth for the assigned ruleset:

    • the page is harvested, but

    • links on that page are not followed.

  • Within a Ruleset you can set a link to be followed to be harvested. By default, following a link will increase the harvest depth by one. If you select the option Keep Same Depth, the link will be followed without increasing the harvest depth. This is useful when harvesting a related group of pages, such as a Facebook profile. See Keep Depth.

Diagram: Track harvest depth

The diagram below illustrates how harvester tracks harvest depth and determines when to follow links:

Diagram: Follow links without increasing harvest depth

The diagram below illustrates how related pages can be followed without triggering an increase in harvest depth: