How do I write UTF-8 and characters in other encodings to file in Python?

alanmulberry

New Member
I have a SharePoint library that captures data entered by user as an XML form. This form is encoded as UTF-8, but some of the characters entered by users are not ASCII (e.g. words from French, Spanish, Maori) and are not saved as UTF-8.Here is an example of such data (abbreviated, sans meta data):\[code\]<?xml version="1.0" encoding="utf-8"?><my:myFields xmlns:my="http://schemas.microsoft.com/etc..."> <my:title>Te whakaako i Te Reo M?ori -- Teaching Te Reo M?ori</my:title>\[/code\]I am using the parse function in ElementTree (xml.etree.ElementTree) to compile this information into a report, which I am then exporting as CSV and sending off in an Excel spreadsheet. As such I would like to convert both the UTF-8 characters and all user input into a single format that works with Excel (cp1252?):\[code\]import xml.etree.ElementTree as ETcourse = ET.parse(os.path.join(path, filename))\[/code\]When I go to write the results of all my calculations to file, I get the following error (for the example XML above):\[code\]UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 48: ordinal not in range(128)\[/code\]When I look at the data, I see that the text from the tag has been converted to unicode with '\xe4' in place of the '?': \[code\]u'Te whakaako i Te Reo M\xe4ori -- Teaching Te Reo M\xe4ori'\[/code\].I would like to be able to have my Excel report include the character '?', but can't seem to get it to encode in a way that achieves this.I am potentially missing some obvious encode/decode point but have been struggling with this for much of the day, so any help is appreciated :)
 
Back
Top