Java - Reading XML file returns wrong characters

Sophia_Illinois

New Member
I have an XML file with thousands of tags to read their text content, as in the screenshot below :
FN4C0.jpg
I am trying to read the text content of all the "word" tags using this code :\[code\]String filePath = "...";File xmlFile = new File( filePath );DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();DocumentBuilder db = dbf.newDocumentBuilder();Document domObject = db.parse( xmlFile );domObject.getDocumentElement().normalize();NodeList categoryNodes = domObject.getElementsByTagName( "category" ); // Get all the <category> nodes.for (int s = 0; s < categoryNodes.getLength(); s++) { //Loop on the <category> nodes. String categoryName = categoryNodes.item(s).getAttributes().getNamedItem( "name" ).getNodeValue(); if( selectedCategoryName.equals( categoryName ) ) { //get its words. NodeList wordsNodes = categoryNodes.item(s).getChildNodes(); for( int i = 0; i < wordsNodes.getLength(); i++ ) { if( wordsNodes.item( i ).getNodeType() != Node.ELEMENT_NODE ) continue; String word = wordsNodes.item( i ).getTextContent(); categoryWordsList.add( word ); // Some words are read wrong !! } break; }}\[/code\]But for some reason many words are being read in wrong manner, examples :\[code\]"AMK6780KBU" is read as "9826</word""ASSI.ABR30326" is read as "rd>ASSI.AEP26""ASSI.25066" is read as "SI.4268</6"\[/code\]It might be because the file size is big. If i just add some empty lines or remove some empty lines from the XML file, other words will be read wrong than the ones mentioned above, which is a strange thing !You can download the XML file from here.
 
Back
Top