Python: I cannot understand how XML iteration works

nathangranny

New Member
Thanks to the brilliant help on my XML parsing problem I got to a point where I am lost in how XML elements are actually processed (with lxml).My data is the output of a nmap scan, made up of many records like the ones below:<?xml version="1.0"?><?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?><nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03"> <host> <status state="down" reason="no-response"/> <address addr="10.232.0.1" addrtype="ipv4"/> </host> <host starttime="1340201455" endtime="1340201930"> <status state="up" reason="echo-reply"/> <address addr="10.232.49.2" addrtype="ipv4"/> <hostnames> <hostname name="host1.example.com" type="PTR"/> </hostnames> <ports> <port protocol="tcp" portid="135"> <state state="open" reason="syn-ack" reason_ttl="123"/> <service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"/> </port> <port protocol="tcp" portid="12345"> <state state="open" reason="syn-ack" reason_ttl="123"/> <service name="http" product="Trend Micro OfficeScan Antivirus http config" method="probed" conf="10"/> </port> </ports> <times srtt="890" rttvar="2835" to="100000"/> </host></nmaprun>I am looking at generating a line when port 12345 is open orport 135 is open and 12345 is openI use the following code for this, which I commented with my understanding of how things go:from lxml import etreeimport timescanTime = str(int(time.time()))d = etree.parse("10.233.85.0.22.xml")# find all hosts recordsfor el_host in d.findall("host"): # only process hosts UP if el_host.find("status").attrib["state"] =="up": # here comes a piece of code which sets the variable hostname # used later - that part works fine (removed for clarity) # get the status of port 135 and 12345 Open12345 = Open135 = False for el_port in el_host.findall("ports/port"): # we are now looping thought the <port> records for a given <host> if el_port.attrib["portid"] == "135": Open135 = el_host.find("ports/port/state").attrib["state"] == "open" if el_port.attrib["portid"] == "12345": Open12345 = el_host.find("ports/port/state").attrib["state"] == "open" # I want to get for port 12345 the description, so I search # for <service> within a given port - only 12345 in my case # I just search the first one as there is only one # this is the place I am not sure I get right el_service = el_host.find("ports/port/service") if el_service.get("product") is not None: Type12345 = el_host.find("ports/port/service").attrib["product"] if Open12345: print "%s %s \"%s\"\n" % (scanTime,hostname,Type12345) if not Open12345 and Open135: print "%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan")The place I am not sure of is highlighted in the comments. With this code I always match Microsoft Windows RPC, like if I was within the record for port 135 (it comes first in the XML file, before port 12345).I am sure that the problem is somewhere in the way I understand the find function. It probably matches everything, independently of the place I am in. In other words there is no recursion (as far as I can tell).In that case how can I code the concept of "get the service name when you are in the record for port 12345"?Thank you.EDIT & SOLUTION:I found the problem in my code. I repost the whole script if someone someday stumbles upon this problem (the output comes from nmap so it could be interesting for someone to reuse - this it to explain the big chunk of code below :) :#!/usr/bin/pythonfrom lxml import etreeimport timeimport argparseparser = argparse.ArgumentParser()parser.add_argument("file", help="XML file to parse")args = parser.parse_args()scanTime = str(int(time.time()))d = etree.parse(args.file)f = open("OfficeScanComplianceDSCampus."+scanTime,"w")print "Parsing "+ args.file# find all hosts recordsfor el_host in d.findall("host"): # only process hosts UP if el_host.find("status").attrib["state"] =="up": # get the first hostname if it exists, otherwise IP el_hostname = el_host.find("hostnames/hostname") if el_hostname is not None: hostname = el_hostname.attrib["name"] else: hostname = el_host.find("address").attrib["addr"] # get the status of port 135 and 12345 Open12345 = Open135 = False for el_port in el_host.findall("ports/port"): # we are now looping thought the <port> records for a given <host> if el_port.attrib["portid"] == "135": Open135 = el_port.find("state").attrib["state"] == "open" if el_port.attrib["portid"] == "12345": Open12345 = el_port.find("state").attrib["state"] == "open" # if port open get info about service if Open12345: el_service = el_port.find("service") if el_service is None: Type12345 = "UNKNOWN" elif el_service.get("method") == "probed": Type12345 = el_service.get("product") else: Type12345 = "UNKNOWN" if Open12345: f.write("%s %s \"%s\"\n" % (scanTime,hostname,Type12345)) if not Open12345 and Open135: f.write("%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan")) if Open12345 and not Open135: f.write("%s %s \"%s\"\n" % (scanTime,hostname,"Non-Windows with 12345"))f.close()I will also explore the xpath idea given by Dikei and Ignacio Vazquez-Abrams.Thank you everyone!
 
Back
Top