Why is elementtree.ElementTree.iterparse using so much memory?

I am using elementtree.ElementTree.iterparse to parse a large (371 MB) xml file.My code is basically this:\[code\]outf = open('out.txt', 'w') context = iterparse('copyright.xml')context = iter(context)dummy, root = context.next()for event, elem in context: if elem.tag == 'foo': author = elem.text elif elem.tag == 'bar': if elem.text is not None and 'bat' in elem.text.lower(): outf.write(elem.text + '\n') elem.clear() #line A root.clear() #line B\[/code\]My question is two-fold: First - Do I need both A and B (see code snippet comments)? I was told that root.clear() clears unnecessary children so memory isn't devoured, but here are my observations: using B and not A is the same as using neither in terms of memory consumption (plotted with task manager). Using only A seems to be the same as using both.Second - Why is this still consuming so much memory? As the program runs, it uses about 100 MB of RAM near the end.I assume it has something to do with outf, but why? Isn't it just writing to disk? And if it is storing that data before outf closes, how can I avoid that?Other information: I am using Python 2.7.3 on Windows.
 
Back
Top