Diffing large XML files in C# (.net 2.0)

InnaVoll

New Member
I'm kind of stuck having to use .Net 2.0, so LINQ xml isn't available, although I would be interested how it would compare...I had to write an internal program to download, extract, and compare some large XML files (about 10 megs each) that are essentially build configurations. I first attempted using libraries, such as Microsoft's XML diff/patch, but comparing the files was taking 2-3 minutes, even with ignoring whitespace, namespaces, etc. (i tested each ignore one at a time to try and figure out what was speediest). The I tried to implement my own ideas - lists of nodes from XmlDocument objects, dictionaries of keys of the root's direct descendants (45000 children, by the way) that pointed to ints to indicate the node position in the XML document... all took at least 2 minutes to run.My final implementation finishes in 1-2 seconds - I made a system process call to diff with a few lines of context and saved those results to display (our development machines include cygwin, thank goodness).I can't help but think there is a better, XML specific way to do this that would be just as fast as a plain text diff - especially since all I'm really interested in is the Name element that is the child of each direct descendant, and could throw away 4/5 of the file for my purposes (we only need to know what files were included, not anything else involving language or version)So, as popular as XML is, I'm sure somebody out there has had to do something similar. What is a fast efficient way to compare these large XML's? (prefereably open source or Free)edit: a sample of the nodes - I only need to find missing Name elements (there are over 45k nodes as well)\[code\]<file> <name>SomeFile</name> <version>10.234</version> <countries>CA,US</countries> <languages>EN</languages> <types>blah blah</types> <internal>N</internal></file>\[/code\]
 
Back
Top