Remove duplicated sections on a xml file

theduimans

New Member
I need to import a xml file from StarUML into Sparx's Enterprise Architect. Both programs support importing and exporting using the xmi interface (xml).When I import the file into EA, all the classes attributes are duplicated (I don't know why), and since there are a lot of classes (over 30K lines file), I'm looking for a method to clean duplicated attributes within a class automatically with Python (so I'm able to learn a bit more of the language, since I'm newbie).What I do is I export the imported file into an xml file and it has more or less this format:\[code\] <?xml version="1.0" encoding="windows-1252"?><XMI xmi.version="1.1" xmlns:UML="omg.org/UML1.3" timestamp="2012-04-26 00:08:23"> <XMI.header> <XMI.documentation> <XMI.exporter>Enterprise Architect</XMI.exporter> <XMI.exporterVersion>2.5</XMI.exporterVersion> </XMI.documentation> </XMI.header> <XMI.content> <UML:Model name="EA Model" xmi.id="MX_EAID_7E991363_115D_4f1c_BE75_50FE1A4CA215"> <UML:Namespace.ownedElement> <UML:Class name="EARootClass" xmi.id="EAID_11111111_5487_4080_A7F4_41526CB0AA00" isRoot="true" isLeaf="false" isAbstract="false"/> <UML:Package xmi.id="EAPK_7E991363_115D_4f1c_BE75_50FE1A4CA215" isRoot="false" isLeaf="false" isAbstract="false" visibility="public"> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="parent" value="http://stackoverflow.com/questions/10331820/EAPK_2CAD2AC4_7032_4f1b_8ED2_CEC33EB2C11B"/> <UML:TaggedValue tag="created" value="http://stackoverflow.com/questions/10331820/2012-04-24 00:00:00"/> <UML:TaggedValue tag="modified" value="http://stackoverflow.com/questions/10331820/2012-04-24 00:00:00"/> <UML:TaggedValue tag="iscontrolled" value="http://stackoverflow.com/questions/10331820/FALSE"/> <UML:TaggedValue tag="version" value="http://stackoverflow.com/questions/10331820/1.0"/> <UML:TaggedValue tag="isprotected" value="http://stackoverflow.com/questions/10331820/FALSE"/> <UML:TaggedValue tag="usedtd" value="http://stackoverflow.com/questions/10331820/FALSE"/> <UML:TaggedValue tag="logxml" value="http://stackoverflow.com/questions/10331820/FALSE"/> <UML:TaggedValue tag="packageFlags" value="http://stackoverflow.com/questions/10331820/CRC=0;"/> <UML:TaggedValue tag="phase" value="http://stackoverflow.com/questions/10331820/1.0"/> <UML:TaggedValue tag="status" value="http://stackoverflow.com/questions/10331820/Proposed"/> <UML:TaggedValue tag="author" value="http://stackoverflow.com/questions/10331820/afu"/> <UML:TaggedValue tag="complexity" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="ea_stype" value="http://stackoverflow.com/questions/10331820/Public"/> <UML:TaggedValue tag="tpos" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="gentype" value="http://stackoverflow.com/questions/10331820/Java"/> </UML:ModelElement.taggedValue> <UML:Namespace.ownedElement> <UML:Class name="InstrumentoMusical" xmi.id="EAID_01C08C9A_48DF_4cb5_858A_7E4C7A5D0F5E" visibility="public" namespace="EAPK_7E991363_115D_4f1c_BE75_50FE1A4CA215" isRoot="false" isLeaf="false" isAbstract="false" isActive="false"> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="isSpecification" value="http://stackoverflow.com/questions/10331820/false"/> <UML:TaggedValue tag="ea_stype" value="http://stackoverflow.com/questions/10331820/Class"/> <UML:TaggedValue tag="ea_ntype" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="package" value="http://stackoverflow.com/questions/10331820/EAPK_7E991363_115D_4f1c_BE75_50FE1A4CA215"/> <UML:TaggedValue tag="date_created" value="http://stackoverflow.com/questions/10331820/2012-04-24 22:12:54"/> <UML:TaggedValue tag="date_modified" value="http://stackoverflow.com/questions/10331820/2012-04-24 22:12:54"/> <UML:TaggedValue tag="tagged" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="tpos" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="ea_localid" value="http://stackoverflow.com/questions/10331820/4296"/> <UML:TaggedValue tag="ea_eleType" value="http://stackoverflow.com/questions/10331820/element"/> <UML:TaggedValue tag="$ea_attsclassified" value="http://stackoverflow.com/questions/10331820/{5765D265-43F8-4ff6-AD34-D355A3BB0E77},{0BF24813-B1E6-418e-8703-916678A8300C},{3F92AD91-F5FC-40d0-BC95-6D42E3E73454},{5DEB56A0-14FB-457c-A7A1-B2C61060B1B0},{286D76E1-1D68-45dc-AB3F-676721F9B939},{A1D709A6-EBD3-4036-A496-575B00CFDE29},{1BCD37C4-6D30-49d8-882D-156D5D4597D2},{AF83F3D2-2937-4c07-AD58-364A40F023FC}"/> <UML:TaggedValue tag="style" value="http://stackoverflow.com/questions/10331820/BackColor=-1;BorderColor=-1;BorderWidth=-1;FontColor=-1;VSwimLanes=1;HSwimLanes=1;BorderStyle=0;"/> </UML:ModelElement.taggedValue> <UML:Classifier.feature> <UML:Attribute name="codigo_item_estoque" changeable="none" visibility="private" ownerScope="instance" targetScope="instance"> <UML:Attribute.initialValue> <UML:Expression/> </UML:Attribute.initialValue> <UML:StructuralFeature.type> <UML:Classifier xmi.idref="eaxmiid0"/> </UML:StructuralFeature.type> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="type" value="http://stackoverflow.com/questions/10331820/Integer"/> <UML:TaggedValue tag="length" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="ordered" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="precision" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="scale" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="collection" value="http://stackoverflow.com/questions/10331820/false"/> <UML:TaggedValue tag="position" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="lowerBound" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="upperBound" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="ea_guid" value="http://stackoverflow.com/questions/10331820/{EA0798D1-0F3E-42fd-B12D-B11815249143}"/> <UML:TaggedValue tag="ea_localid" value="http://stackoverflow.com/questions/10331820/3093"/> <UML:TaggedValue tag="rose_uuid" value="http://stackoverflow.com/questions/10331820/DCE:523009DC-AFCE-4108-A49C-AA1F25DC4925" xmi.id="EAID_BBE8B8E1_E863_49cc_8630_E30E8859A33A"/> <UML:TaggedValue tag="ordering" value="http://stackoverflow.com/questions/10331820/unordered" xmi.id="EAID_5D8FFBD6_659A_419f_B71A_28A4A0B3CDB3"/> </UML:ModelElement.taggedValue> </UML:Attribute> <UML:Attribute name="codigo_item_estoque" changeable="none" visibility="private" ownerScope="instance" targetScope="instance"> <UML:Attribute.initialValue> <UML:Expression/> </UML:Attribute.initialValue> <UML:StructuralFeature.type> <UML:Classifier xmi.idref="eaxmiid0"/> </UML:StructuralFeature.type> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="type" value="http://stackoverflow.com/questions/10331820/Integer"/> <UML:TaggedValue tag="length" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="ordered" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="precision" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="scale" value="http://stackoverflow.com/questions/10331820/0"/> <UML:TaggedValue tag="collection" value="http://stackoverflow.com/questions/10331820/false"/> <UML:TaggedValue tag="position" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="lowerBound" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="upperBound" value="http://stackoverflow.com/questions/10331820/1"/> <UML:TaggedValue tag="ea_guid" value="http://stackoverflow.com/questions/10331820/{9B56A54E-62B2-451b-BB04-3C7A83B20663}"/> <UML:TaggedValue tag="ea_localid" value="http://stackoverflow.com/questions/10331820/2615"/> <UML:TaggedValue tag="rose_uuid" value="http://stackoverflow.com/questions/10331820/DCE:523009DC-AFCE-4108-A49C-AA1F25DC4925" xmi.id="EAID_5DFAA01C_9624_4fad_B31A_81206F56C849"/> </UML:ModelElement.taggedValue> </UML:Attribute>\[/code\]For an extended version of this file, take a look here: http://www.peoplesliberationfront.n...#EFC8TmezE5OUj51zo8Pat67Zauacy9qaDXNHxSCYBBw=This is the most similar post that I've found on the issue: how to remove duplicate values in a XML file and keep the last one? but in that case the xml is way more simple than this one, and unfortunately I'm not able to solve my problem.As you can see, the "codigo_item_estoque" attribute is two times on the file with different xmi.id. What I would like to do is for each attribute just keep the first or the last one, but not both.I don't know if it's easier to work with xml structures (eTree or similar) or just as "text" processing.Any help will be more than welcome.Thanks a lot!
 
Back
Top