Maintaining special characters when parsing XML with Python?

fuzzball · Jul 21, 2012

I've got an XML file that I'm parsing with Python & outputting as Python code to a file.Some of the XML contains Reg Ex values and strings which will be shown as a dialog on screen so there are a few special characters I need to maintain. The code follows, but how can this be done?The XML looks a bit like this;\[code\]<variable id="passportnumber" value="" type="String"> <validate> <regularExpression fieldID="passportnumber" errorID="3007162"><![CDATA[^[a-zA-Z+:?<>;*()%="!0-9./',&\s-]{1,35}$]]></regularExpression> </validate></variable>\[/code\]And for a dialog;\[code\]<if> <condition><![CDATA[$taxcode$ == $previousemergencytaxcode$ and $previousemergencytaxcode$ != $emergencytaxcode$]]></condition> <then> <dialog id="taxCodeOutdatedDialog" text="Are you sure this is the correct tax code?

The emergency code for the tax year 2011-12 was '$previousemergencytaxcode$'.
The emergency code for the tax year 2012-13 is '$emergencytaxcode$'.

Proceed?" type="YES|NO|CANCEL" /> </then></if>\[/code\]The full Python script is here and the specifics to parse these two are;\[code\]def parse_regularExpression(self, elem): self.out('') self.out("item_regularExpression(fieldID='{0}', value='http://stackoverflow.com/questions/10495387/{1}')".format(elem.attrib['fieldID'],elem.text))def parse_dialog(self, elem): self.out('') self.out("item_dialog(id='{0}', text='{1}', type='{2}')".format(elem.attrib['id'], elem.attrib['text'],elem.attrib['type']))\[/code\]The line feed (\[code\]
\[/code\]) is the main thing I'm unsure how to deal with. It seems that etree is outputting that as a line break even if it is triple quoted. It outputs the text value as;\[code\]item_dialog(id='taxCodeOutdatedDialog', text='Are you sure this is the correct tax code? The emergency code for the tax year 2011-12 was '$previousemergencytaxcode$'. The emergency code for the tax year 2012-13 is '$emergencytaxcode$'. Proceed?', type='YES|NO|CANCEL')\[/code\]

Maintaining special characters when parsing XML with Python?

fuzzball

New Member