Hi,
I have recently had osmosis in --rri mode refuse to apply an update it had downloaded from OSM, claiming there was an UTF8 error in the file. I looked and looked but the file was fine, passed UTF8 and XML validity checks. I tried to isolate the line that gave me the "error" but isolating it made the problem go away. Only including the 583379 previous lines makes the error occur. So I now have two .osc files, one with 583380 lines and one with 583379 lines: $ wc -l x.osc y.osc 583380 x.osc 583379 y.osc their only difference is one line at the beginning of the longer file: $ diff x.osc y.osc 2d1 < <node id="4585086821" version="1" timestamp="2017-01-02T09:18:33Z" uid="72020" user="Petr1868" changeset="44840247" lat="49.9957035" lon="14.2460943"/> But the longer one fails to process in osmosis, and the shorter one works: $ osmosis --read-xml-change x.osc --write-null-change Jan 11, 2017 10:19:41 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version 0.43.1 ... SEVERE: Thread for task 1-read-xml-change failed org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file x.osc. publicId=(null), systemId=(null), lineNumber=583379, columnNumber=90. at org.openstreetmap.osmosis.xml.v0_6.XmlChangeReader.run(XmlChangeReader.java:114) $ osmosis --read-xml-change y.osc --write-null-change Jan 11, 2017 10:20:34 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version 0.43.1 ... Jan 11, 2017 10:20:35 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Total execution time: 1448 milliseconds. Since the line which supposedly contains the "error" is identical in both files, it can't really be an error (and the line does not contain any non-ASCII characters). Re-formatting the XML file with "xmlstarlet fo" or "xmlstarlet c14n" makes the problem go away. I've reproduced this bug on different machines with different Osmosis versions. I've tried these java versions with identical results: $ java -showversion java version "1.7.0_121" OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.14.04.1) $ java -showversion openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) I have uploaded the two .osc files here: http://www.remote.org/frederik/tmp/osmosis-bug-try-read-xml-change-write-null-change-on-these-files-which-differ-only-by-one-line.zip I'd be interested in any insights anyone has to share. Bye Frederik -- Frederik Ramm ## eMail [hidden email] ## N49°00'09" E008°23'33" _______________________________________________ osmosis-dev mailing list [hidden email] https://lists.openstreetmap.org/listinfo/osmosis-dev |
Hi,
On 01/11/2017 10:30 AM, Frederik Ramm wrote: > SEVERE: Thread for task 1-read-xml-change failed I was a bit over-eager in shortening the stack trace. Full detail: org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file x.osc. publicId=(null), systemId=(null), lineNumber=583379, columnNumber=90. at org.openstreetmap.osmosis.xml.v0_6.XmlChangeReader.run(XmlChangeReader.java:114) at java.lang.Thread.run(Thread.java:745) Caused by: org.xml.sax.SAXParseException; lineNumber: 583379; columnNumber: 90; Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.openstreetmap.osmosis.xml.v0_6.XmlChangeReader.run(XmlChangeReader.java:109) ... 1 more Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(Unknown Source) at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source) ... 11 more -- Frederik Ramm ## eMail [hidden email] ## N49°00'09" E008°23'33" _______________________________________________ osmosis-dev mailing list [hidden email] https://lists.openstreetmap.org/listinfo/osmosis-dev |
If the file is valid then perhaps it's a bug in the Xerces parser bundled with Osmosis. The JDK version you use shouldn't matter because I don't use its XML parser (Java bundles an ancient version of Xerces with more serious unicode bugs).
I don't have any suggestions other than to check if there's a later version of Xerces available. To change it, modify the following file: Change this line: dependencyVersionXerces=2.9.1 I see I added the following comments above that line which explains why I haven't upgraded it yet. # Remaining on 2.9.1 instead of 2.10.0 for now because the newer version # depends on org.w3c.dom.ElementTraversal which is not being transitively # included. This could be possibly be fixed by including a newer version # of xml-apis but this hasn't been verified. Perhaps it's currently using the JDK version of xml-apis, but we may need to explicitly include a later version of that as well. Ugh. As an aside, I think Java 9 is supposed to be fixing some of this bundled dependency mess and allowing a newer XML library to be included. I'd offer to help but I just don't have time. Osmosis isn't getting much love from me any more :-( On Wed, 11 Jan 2017 at 20:33 Frederik Ramm <[hidden email]> wrote: Hi, _______________________________________________ osmosis-dev mailing list [hidden email] https://lists.openstreetmap.org/listinfo/osmosis-dev |
Brett,
thank you for your comment. The issue is not an urgent one for me since workarounds exist, and on the many osmosis-based OSM updating machines I've been running continuously for years, this is only the second time I run into it. So it is a rare quirk, but of course I would feel better if I knew where it came from. I've re-built osmosis with Xerces 2.11.0 and this doesn't change the situation. Should I perhaps try and build a minimal "use Xerces to parse this XML file" program, and if I can replicate the problem with that, file a bug with Xerces? Or is the way in which Osmosis uses Xerces somehow special so that a simple program like that would be very unlikely to trigger the bug? Bye Frederik -- Frederik Ramm ## eMail [hidden email] ## N49°00'09" E008°23'33" _______________________________________________ osmosis-dev mailing list [hidden email] https://lists.openstreetmap.org/listinfo/osmosis-dev |
Oops, lost this in my inbox :-(
On Thu, 12 Jan 2017 at 19:22 Frederik Ramm <[hidden email]> wrote: <snip> I've re-built osmosis with Xerces 2.11.0 and this doesn't change the I think it'd be a great place to start and think it *should* trigger the bug. But I'm not sure what we'd do about it :-) Osmosis doesn't do anything special that I can think of. It just uses the standard Java mechanisms to invoke XML parsing. One possible thing to try would be to use the XML parser used in the "fast" XML processor. It uses XML stream parsing as opposed to SAX parsing (i.e. pull vs. push processing). Brett _______________________________________________ osmosis-dev mailing list [hidden email] https://lists.openstreetmap.org/listinfo/osmosis-dev |
Free forum by Nabble | Edit this page |