I'm validating some XML files against Schematron stylesheets by using Probatron4j, which uses Saxon internally. Most of the time, this works fine, but occasionally, processing crashes with the error\[quote\] org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.\[/quote\]My research has shown that this message typically indicates (in no particular order)
- blatantly invalid data (e.g. attempting to read a ZIP file as if it were an XML file);
- the presence of byte order marks;
- the presence of characters that are not legal in UTF-8; or
- a document that is lying when it claims to be UTF-8-encoded.