Parse xml with lxml - extract element value

IvanVeretko

New Member
Let's suppose we have the XML file with the structure as follows.\[code\]<?xml version="1.0" ?> <searchRetrieveResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/zing/srw/ http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd" xmlns="http://www.loc.gov/zing/srw/"> <records xmlns:ns1="http://www.loc.gov/zing/srw/"> <record> <recordData> <record xmlns=""> <datafield tag="000"> <subfield code="a">123</subfield> <subfield code="b">456</subfield> </datafield> <datafield tag="001"> <subfield code="a">789</subfield> <subfield code="b">987</subfield> </datafield> </record> </recordData> </record> <record> <recordData> <record xmlns=""> <datafield tag="000"> <subfield code="a">123</subfield> <subfield code="b">456</subfield> </datafield> <datafield tag="001"> <subfield code="a">789</subfield> <subfield code="b">987</subfield> </datafield> </record> </recordData> </record> </records></searchRetrieveResponse>\[/code\]I need to parse out:
  • The content of the "subfield" (e.g. 123 in the example above) and
  • Attribute values (e.g. 000 or 001)
I wonder how to do that using lxml and XPath. Pasted below is my initial code and I kindly ask someone to explain me, how to parse out values.\[code\]import urllib, urllib2from lxml import etree url = "https://dl.dropbox.com/u/540963/short_test.xml"fp = urllib2.urlopen(url)doc = etree.parse(fp)fp.close()ns = {'xsi':'http://www.loc.gov/zing/srw/'}for record in doc.xpath('//xsi:record', namespaces=ns): print record.xpath("xsi:recordData/record/datafield[@tag='000']", namespaces=ns)\[/code\]
 
Back
Top