Python Scrapy customize the crawl items xml format

vurbinnenia · Apr 7, 2013

I using scrapy to crawl a webpage and I want the output to xml file in certain format, below are my code.Item class\[code\]class Item(Item):# define the fields for your item here like: id = Field() name = Field() address = Field() birthdate = Field() review = Field()\[/code\]Spider class\[code\]class FriendSpider(BaseSpider):# identifies of the Spidername = "friend"count = 0 allowed_domains = ["example.com.us"]start_urls = [ "http://example.com.us/biz/friendlist/"]def start_requests(self): for i in range(0,1722,40): yield self.make_requests_from_url("http://example.com.us/biz/friendlist/?start=%d" % i)def parse(self, response): response = response.replace(body=response.body.replace('<br />', '\n')) hxs = HtmlXPathSelector(response) sites = hxs.select('//ul/li') items = [] for site in sites: item = Item() self.count += 1 item['id'] = str(self.count) item['name'] = site.select('.//div/div/h4/text()').extract() item['address'] = site.select('h4/span/text()').extract() item['review'] = ''.join(site.select('.//div[@class="review"]/p/text()').extract()) item['birthdate'] = site.select('.//div/div/h5/text()').extract() items.append(item) return items\[/code\]The output was in this format:\[code\]<?xml version="1.0" encoding="utf-8"?><items> <item> <id>1</id> <name><value>Keith</value></name> <review>txt............</review> <address><value>United States</value></address> <birthdate><value>1988-04-03</value></birthdate> </item> .....<items>\[/code\]How can I customize the xml format to below: remove the value tag and move the id to item root.\[code\]<?xml version="1.0" encoding="utf-8"?><items> <friend id = "1"> <name>Keith</name> <review>txt............</review> <address>United States</address> <birthdate>1988-04-03</birthdate> </friend> .....<items>\[/code\]

Python Scrapy customize the crawl items xml format

vurbinnenia

New Member