java sax parser mangles attributes for xml 1.1

un0wn

New Member
I'm using java's sax classes to parse an xml file. If the xml file says version 1.0, everything works fine, but if it says version 1.1, then some of the attributes get mangled, giving me the wrong results but not throwing any kind of exception.My xml file basically looks like this:\[code\]<?xml version="1.1" encoding="UTF-8" ?><gpx> <trk> <name>Name of the track</name> <trkseg> <trkpt lat="12.3456789" lon="1.2345678"> <ele>1234</ele> <time>2013-03-26T12:34:56Z</time> <speed>0</speed> </trkpt> ... and then 419 further identical copies of this trkpt </trkseg> </trk></gpx>\[/code\]So what I expect, when I use sax to parse this file, is to find 420 trkpt tags, and for each of them to have lat and lon attributes. In particular, I expect to find 420 "lat" attributes which are all "12.3456789".For the parsing I construct a handler object and give it the stream to this local file:\[code\]SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();inStream = new FileInputStream(file);saxParser.parse(inStream, handler);System.out.println("done");\[/code\]The handler class extends \[code\]org.xml.sax.helpers.DefaultHandler\[/code\] and just has one method, \[code\]startElement\[/code\] to react to the opening of the trkpt tag:\[code\]public void startElement(String uri, String localName, String qName, Attributes attributes){ if (qName.equals("trkpt") && attributes != null && attributes.getLength() == 2 && attributes.getValue(0).charAt(0) != '1') { // The trkpt tag has two attributes // but the value of the first one doesn't begin with '1' System.out.println(attributes.getQName(0) + " = " + attributes.getValue(0)); } super.startElement(uri, localName, qName, attributes);}\[/code\]So what is the result?If the xml file has version 1.0, then all I see is "done". 420 trkpt tags were found, all of them had two attributes, the first one was always called "lat" and the value of this attribute always started with '1', as I expect. Great!If the xml file is changed to specify \[code\]version="1.1"\[/code\] on the first line, then I get the following output:\[code\]lat = :34.56Z</tlat = :56Z</timedone\[/code\]So even though all my 420 points should be identical, two of them gave me a completely wrong attribute value. No exceptions are thrown. Still 420 trkpts were found, and all had two attributes called "lat" and "lon". Oddly the lon values are always ok.I created this xml file in a text editor by direct copy/pasting the first trkpt, so I'm sure that all the values are identical, I'm sure there are no points in the xml file with funny attribute values, and I'm sure that there are no non-ascii character values or entity codes or anything else odd about the file.I've tried it using Sun's JRE6, OpenJDK6 and OpenJDK7, on three different machines with two different OSs. So either I'm doing something wrong, or this particular xml file is incompatible with xml1.1 somehow, or there's a widespread sax bug (which seems unlikely as I would expect it to affect lots of people). Again, please note, with xml1.0 it all works fine. Also note, there's nothing special about the number 420, it's just that if the file only has 100 entries then they all get parsed properly. If you have several thousand entries then a certain number of them get their first attribute value mangled in this way. The length of the attribute value always seems to be correct but it's pulling characters out from the wrong point in the file. Index overflow perhaps?I tried removing all the speed tags, but the problem still persists if you have enough trkpts. It's also sensitive to additional whitespace, so the problem occurs with different points or gives back different attribute values if I add line breaks between the trkpts.
 
Back
Top