gammingempina
New Member
I have a \[code\]String\[/code\] contating binary \[code\]0\[/code\] inside in UTF-8 (\[code\]"A\u0000B"\[/code\]). JAXB happily marshalls XML document containing such character but then fails to unmarshall it:\[code\]final JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);final Marshaller marshaller = jaxbContext.createMarshaller();final Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();Root root = new Root();root.value = "http://stackoverflow.com/questions/12779742/A/u0000B";final ByteArrayOutputStream os = new ByteArrayOutputStream();marshaller.marshal(root, os);unmarshaller.unmarshal(new ByteArrayInputStream(os.toByteArray()));\[/code\]The root class is just simple:\[code\]@XmlRootElementclass Root { @XmlValue String value; }\[/code\]Output XML contains binary \[code\]0\[/code\] as well between \[code\]A\[/code\] and \[code\]B\[/code\] (in hex: \[code\]41 00 42\[/code\]) which causes the following error during unmarshalling:\[code\]org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 63; An invalid XML character (Unicode: 0x0) was found in the element content of the document.\[/code\]Interestingly using raw DOM API (example) produces escaped \[code\]0\[/code\]: \[code\]A B\[/code\] but trying to read it back yields similar error. Also \[code\]0\[/code\] (neither binary nor escaped) is not allowed by any XML parser or \[code\]xmllint\[/code\] (see also: Python + Expat: Error on entities).My questions:
- why JAXB/DOM API allows creating invalid XML documents which it can not read back? Shouldn't it fail fast during marshalling?
- is there some elegant and global solution? I saw people tackling this problem by: