Unable to unmarshall \u0000 after successfully marshalling it

gammingempina

New Member
I have a \[code\]String\[/code\] contating binary \[code\]0\[/code\] inside in UTF-8 (\[code\]"A\u0000B"\[/code\]). JAXB happily marshalls XML document containing such character but then fails to unmarshall it:\[code\]final JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);final Marshaller marshaller = jaxbContext.createMarshaller();final Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();Root root = new Root();root.value = "http://stackoverflow.com/questions/12779742/A/u0000B";final ByteArrayOutputStream os = new ByteArrayOutputStream();marshaller.marshal(root, os);unmarshaller.unmarshal(new ByteArrayInputStream(os.toByteArray()));\[/code\]The root class is just simple:\[code\]@XmlRootElementclass Root { @XmlValue String value; }\[/code\]Output XML contains binary \[code\]0\[/code\] as well between \[code\]A\[/code\] and \[code\]B\[/code\] (in hex: \[code\]41 00 42\[/code\]) which causes the following error during unmarshalling:\[code\]org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 63; An invalid XML character (Unicode: 0x0) was found in the element content of the document.\[/code\]Interestingly using raw DOM API (example) produces escaped \[code\]0\[/code\]: \[code\]AB\[/code\] but trying to read it back yields similar error. Also \[code\]0\[/code\] (neither binary nor escaped) is not allowed by any XML parser or \[code\]xmllint\[/code\] (see also: Python + Expat: Error on entities).My questions:But shouldn't mature XML stack in Java (I'm using 1.7.0_05) handle this either by default or by having some simple setting? I'm looking for escaping, ignoring or failing fast - but the default behavior of generating invalid XML is not acceptable. I believe such fundamental functionality should not require any extra coding on the client side.
 
Back
Top