Parsing XML with duplicate root elements

dark nite10

New Member
I am trying to programmatically clean up invalid XML with duplicate root elements in C# .NET 4.0. What I want to do is consolidate all of the inner elements into one root element and remove the duplicates roots, so that\[code\]<a> <b></b></a><a> <c></c></a>\[/code\]becomes\[code\]<a> <b></b> <c></c></a>\[/code\]However, the duplicated root element could also appear in the inner XML. In that case, we would not want to replace it, so that\[code\]<a> <a></a> <b></b></a><a> <c></c> <a></a></a>\[/code\]becomes\[code\]<a> <a></a> <b></b> <c></c> <a></a></a>\[/code\]Also, the duplicated root element isn't guaranteed to always be \[code\]<a>\[/code\]; it could have any name.Thus far I've been trying to think of some sort of elegant Regex to accomplish this task, such as \[code\]/<((.|\n|\r)*?)>(.|\n|\r)*<\/\1>/\[/code\], but the problem with this is that a greedy match on the inner XML matches too much, and non-greedy match on the inner XML matches too little.I was hoping I wouldn't have to resort to creating a stack to count open and close tags to identify when I was back to the root of the document. I'm looking for a simple and elegant way of solving this problem.Open source, third-party libraries are potentially acceptable solutions if one of them handles this kind of situation, but I'd rather avoid them.Does anyone have any ideas?
 
Back
Top