Validating XML doc results in “Invalid byte 1 of 1-byte UTF-8 sequence.”

Frozt

New Member
I'm validating some XML files against Schematron stylesheets by using Probatron4j, which uses Saxon internally. Most of the time, this works fine, but occasionally, processing crashes with the error\[quote\] org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.\[/quote\]My research has shown that this message typically indicates (in no particular order)
  • blatantly invalid data (e.g. attempting to read a ZIP file as if it were an XML file);
  • the presence of byte order marks;
  • the presence of characters that are not legal in UTF-8; or
  • a document that is lying when it claims to be UTF-8-encoded.
None of these applies to the document I'm processing. I've inspected the input in byte array form during program execution, and it doesn't contain a BOM or any non-ASCII characters.Processing gets about a fifth of the way through my 30kb doc before crashing on an unremarkable English sentence (by "unremarkable," I mean that all bytes are between 32 (space) and 122 (lowercase z); in other words, standard keyboard characters). Here's the supposedly offending element:\[code\][60, 80, 97, 114, 97, 103, 114, 97, 112, 104, 62, 69, 120, 101, 99, 117, 116, 105, 118, 101, 32, 83, 117, 109, 109, 97, 114, 121, 58, 32, 70, 114, 111, 109, 32, 49, 55, 53, 52, 32, 116, 111, 32, 49, 55, 54, 51, 13, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 69, 117, 114, 111, 112, 101, 32, 97, 110, 100, 32, 116, 104, 101, 32, 65, 109, 101, 114, 105, 99, 97, 115, 32, 119, 101, 114, 101, 32, 99, 97, 117, 103, 104, 116, 32, 117, 112, 32, 105, 110, 32, 97, 32, 99, 111, 110, 102, 108, 105, 99, 116, 32, 98, 101, 116, 119, 101, 101, 110, 32, 69, 110, 103, 108, 97, 110, 100, 44, 32, 117, 110, 100, 101, 114, 32, 75, 105, 110, 103, 32, 71, 101, 111, 114, 103, 101, 32, 73, 73, 44, 32, 97, ...\[/code\]Oddly, the failing document was generated by removing a few elements from a larger document that gets processed cleanly by the same code.I know that the exception is being thrown in the \[code\]parse(InputSource input)\[/code\] method of an object that implements the \[code\]org.xml.saxXMLReader\[/code\] interface. According to the Javadoc, \[code\]SAXException\[/code\] indicates\[quote\] Any SAX exception, possibly wrapping another exception.\[/quote\]Examining the exception in a debugger shows that there is no wrapped exception.What could be causing this error?
 
Back
Top