How to fetch information in XML format from Nutch spidered webpages database

incelpize

New Member
I'm trying to build books aggregation portal. Nutch provides me excellent web crawler, but I want very specific information like, book title, book price, ISBN, author etc. How to extract that information from the crawled pages? I would like to fetch this information in XML format if possible.In addition to the above, I would like to ask if this is the right approach! Can it be done in better way with other open source software?
 
Back
Top