How to fetch information in XML format from Nutch spidered webpages database

incelpize · Apr 9, 2013

I'm trying to build books aggregation portal. Nutch provides me excellent web crawler, but I want very specific information like, book title, book price, ISBN, author etc. How to extract that information from the crawled pages? I would like to fetch this information in XML format if possible.In addition to the above, I would like to ask if this is the right approach! Can it be done in better way with other open source software?

How to fetch information in XML format from Nutch spidered webpages database

incelpize

New Member