Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

DrireeglitS · Jul 20, 2012

I'm trying to parse, manipulate, and output HTML using Python's ElementTree:\[code\]import sysfrom cStringIO import StringIOfrom xml.etree import ElementTree as ETfrom htmlentitydefs import entitydefssource = StringIO("""<html><body>Less than <Non-breaking space  </body></html>""")parser = ET.XMLParser()parser.parser.UseForeignDTD(True)parser.entity.update(entitydefs)etree = ET.ElementTree()tree = etree.parse(source, parser=parser)for p in tree.findall('.//p'): print ET.tostring(p, encoding='UTF-8')\[/code\]When I run this using Python 2.7 on Mac OS X 10.6, I get:\[code\]Less than <Traceback (most recent call last): File "bar.py", line 20, in <module> print ET.tostring(p, encoding='utf-8') File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1120, in tostring ElementTree(element).write(file, encoding, method=method) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 815, in write serialize(write, self._root, encoding, qnames, namespaces) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 931, in _serialize_xml write(_escape_cdata(text, encoding)) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1067, in _escape_cdata return text.encode(encoding, "xmlcharrefreplace")UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 19: ordinal not in range(128)\[/code\]I thought that specifying "encoding='UTF-8'" would take care of the non-breaking space character, but apparently it doesn't. What should I do instead?

Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

DrireeglitS

New Member