Crawler Coding: determine if pages have been crawled?

Raqqeajuewbdp · Sep 17, 2012

I am working on a crawler in PHP that expects m URLs at which it finds a set of n links to n pages (internal pages) which are crawled for data. Links may be added or removed from the n set of links. I need to keep track of the links/pages so that i know which have been crawled, which ones are removed and which ones are new.How should i go about to keep track of which m and n pages are crawled so that next crawl fetches new urls, re-checks still existing urls and ignores obsolete urls?

Crawler Coding: determine if pages have been crawled?

Raqqeajuewbdp

New Member