Exclude Nested Elements While Doing Tree Traversal

kSEOBristoln

New Member
I am using Nokogiri to parse an XML file that has (roughly) the following structure:\[code\]<diag> <name>A00</name> <desc>Cholera</desc> <diag> <name>A00.0</name> <desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc> </diag> <diag> ... </diag> ...</diag>\[/code\]As you can see this tree has \[code\]diag\[/code\] nodes that can be nested arbitrarily deep, yet each nesting is a more specific description of the parent node.I want to "flatten" this tree so that rather than having \[code\]A00.0\[/code\] nested within \[code\]A00\[/code\] I can just have a list going something like\[code\]A00A00.0A00.1...A00.34...A01...\[/code\]What I have so far looks like this:\[code\]require 'nokogiri'icd10 = File.new("icd10.xml", "r")doc = Nokogiri::XML(icd10.read) do |config| config.strict.noblanksendicd10.close@diags = {}@diag_count = 0def get_diags(node) node.children.each do |n| if n.name == "diag" @diags[@diag_count] = n @diag_count += 1 get_diags(n) end endend# The xml file has sections but what I really want are the contents of the sectionsdoc.xpath('.//section').each do |n| get_diags(n)end\[/code\]So far this works in that I do get all the \[code\]diag\[/code\] elements within the file, but the problem is that the parent nodes still contain all the content that is found in later nodes (e.g. \[code\]@diags[0]\[/code\] contains the \[code\]A00\[/code\], \[code\]A00.0\[/code\], \[code\]A00.1\[/code\], etc. nodes while \[code\]@diags[1]\[/code\] contains just the \[code\]A00.0\[/code\] content).How can I exclude nested elements from the parent element while traversing the xml content in \[code\]get_diags\[/code\]? Thanks in advance!== EDIT ==So I added this to my \[code\]get_diags\[/code\] method\[code\]def get_diags(node) node.children.each do |n| if n.name == "diag" f = Nokogiri::XML.fragment(n.to_s) f.search('.//diag').children.each do |d| if d.name == "diag" d.remove end end @diags[@diag_count] = f @diag_count += 1 get_diags(n) end endend\[/code\]Now \[code\]@diags\[/code\] holds a fragment of xml where all the nested \[code\]<diag>...</diag>\[/code\] are removed, which in one sense is what I want, but overall this is very very ugly, and I was wondering if anyone could share a better way to go about this. Thanks
 
Back
Top