Infinite loop while parsing XHTML using DocumentBuilder “parse”

robadshead · Jul 20, 2012

I have this method which loads an XHTML document from a \[code\]java.io.InputStream\[/code\] returning a \[code\]org.w3c.dom.Document\[/code\].\[code\]private Document loadDocFrom(InputStream is) throws SAXException, IOException, ParserConfigurationException { DocumentBuilderFactory domFactory = DocumentBuilderFactory .newInstance(); domFactory.setNamespaceAware(true); // never forget this DocumentBuilder builder = domFactory.newDocumentBuilder(); Document doc = builder.parse(is); is.close(); return doc;}\[/code\]This method works, I have tested it with some XHTML documents (e.g. \[code\]http://pastebin.com/L2kHwggU\[/code\]) and XHTML websites.But, for some documents such as this http://pastebin.com/v675yWSJ or even websites like \[code\]www.w3.org\[/code\], it enters an infinite loop at \[code\]Document doc = builder.parse(is);\[/code\].EDIT:@Michael Kay found the problem, but I am waiting for his solution.One of the other possible solutions is to ignore the DTD:\[code\]domFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)\[/code\]Thank you for your help.

Infinite loop while parsing XHTML using DocumentBuilder &ldquo;parse&rdquo;

robadshead

New Member

Infinite loop while parsing XHTML using DocumentBuilder “parse”