XML parsing, expat and UTF8

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

XML parsing, expat and UTF8

Nick Whitelegg

Have been looking round for the best XML parsing library to use to develop a C
or C++ SAX API for parsing OSM data, and it seems the best is probably the
expat library - it seems more lightweight and easier to use than libxml.

The expat documentation says it supports UTF8. ISTR that is the encoding which
OSM is using, so using expat will be safe to parse OSM data containing
non-English languages. Is this correct?

Thanks,
Nick

_______________________________________________
dev mailing list
[hidden email]
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: XML parsing, expat and UTF8

Andreas Brauchli-2
> The expat documentation says it supports UTF8. ISTR that is the encoding which
> OSM is using, so using expat will be safe to parse OSM data containing
> non-English languages. Is this correct?
if it behaves like described in the docu then there shouldn't be any
problems.

try parsing this: (name should display Thun Sud, with u-umlaut (two dots
over the u of Sud: Süd))

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.3" generator="OpenStreetMap server">
  <way id="728459" timestamp="2006-05-14 16:47:29">
    <seg id="2635641"/>
    <seg id="2635642"/>
    <seg id="2635643"/>
    <seg id="2635644"/>
    <seg id="2635645"/>
    <tag k="name" v="Thun Süd"/>
    <tag k="highway" v="motorway_link"/>
    <tag k="created_by" v="JOSM"/>
  </way>
</osm>

andreas


_______________________________________________
dev mailing list
[hidden email]
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev