coolsitesnew
New Member
I'm trying to parse (fairly big) XML files using \[code\]javax.xml.stream.XMLStreamReader\[/code\]. The files are well-formed (validated with xmllint), but still I get the following exception:\[code\]javax.xml.stream.XMLStreamException: ParseError at [row,col]:[12418,95]Message: XML document structures must start and end within the same entity.at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:592)\[/code\]This is a simplification of my code:\[code\]while(parser.hasNext()){ parser.next(); if (parser.getEventType() == XMLStreamReader.START_ELEMENT){ if (parser.getLocalName() == "s") { // do stuff } } if (parser.getEventType() == XMLStreamReader.END_ELEMENT){ if (parser.getLocalName() == "s") { // do more stuff } } if (parser.getEventType() == XMLStreamReader.CHARACTERS){ if (inSentenceElement) { // process text parser.getText()... } }}\[/code\]I've checked the row/col in the XML as given in the error message, with nothing striking me as unusual. I've been thinking that the size of the files might be a problem and that they get truncated so that an EOF is read before the root element is closed. Is that feasible and if yes, how can I avoid that? Edit: the bz2-zipped files are up to 1.5G in size with up to 7M lines, but also relatively small files at 4M crash after around 10K lines (although the number of lines after which the problem occurs tends to vary by some 3K lines.