[osmosis-dev] Duplicate keys in replication diffs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[osmosis-dev] Duplicate keys in replication diffs

Osmosis Development mailing list
I believe I've found a bug with how osmosis handles replication diffs,
control characters, and duplicate keys. Because it involves control
characters, this bug can't be seen by viewing the objects in the
browser, so I've supplied shell commands that will show them.

In changeset 90486873 a node was introduced with the tags

    <tag k="shelter_type" v="weather_shelter"/>
    <tag
k="shelter_type^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?"
v="lean_to"/>

where the ^? are the DEL character (U+0001). This was obviously not
what the user intended, but is valid OSM XML and correctly reproduced
by the API if you query for the node (e.g. with the shell command
`curl -s https://www.openstreetmap.org/api/0.6/node/7881523719/3 | less')

This was transmitted out in the minutely replication feed with sequence
number 004183898. If this is examined with
`curl -s
https://planet.openstreetmap.org/replication/minute/004/183/898.osc.gz |
zcat | less -p 90486873'
this reveals the XML

       <tag k="shelter_type" v="weather_shelter"/>
       <tag k="shelter_type" v="lean_to"/>

This XML does not contain the DEL characters, and has two keys which are
the same. We found this because it was resulting in a duplicate key
error, but aside from that Osmosis is producing results inconsistent
with what is in the database.


_______________________________________________
osmosis-dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/osmosis-dev
Reply | Threaded
Open this post in threaded view
|

Re: [osmosis-dev] Duplicate keys in replication diffs

Stephan Knauss
Hello Paul,

Thanks for tracking down and reporting this bug.

On 17.09.2020 01:22, Paul Norman via osmosis-dev wrote:
> In changeset 90486873 a node was introduced with the tags
>
>     <tag k="shelter_type" v="weather_shelter"/>
>     <tag
> k="shelter_type^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?"
> v="lean_to"/>

You did not mention: Have you already filed a bugreport against
potlatch2? I think it was not intended to push this key.

Have you checked whether we have more unintended tagging like this in
the database?
Taginfo has a report about potentially problematic characters, but it
does not seem to consider control characters.
https://taginfo.openstreetmap.org/reports/characters_in_keys#problem

Stephan



_______________________________________________
osmosis-dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/osmosis-dev
Reply | Threaded
Open this post in threaded view
|

Re: [osmosis-dev] Duplicate keys in replication diffs

Osmosis Development mailing list
On 2020-09-16 11:04 p.m., Stephan Knauss wrote:

> Hello Paul,
>
> Thanks for tracking down and reporting this bug.
>
> On 17.09.2020 01:22, Paul Norman via osmosis-dev wrote:
>> In changeset 90486873 a node was introduced with the tags
>>
>>     <tag k="shelter_type" v="weather_shelter"/>
>>     <tag
>> k="shelter_type^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?"
>> v="lean_to"/>
>
> You did not mention: Have you already filed a bugreport against
> potlatch2? I think it was not intended to push this key.

I mentioned it to RichardF, but have been following up on the
replication side first.

> Have you checked whether we have more unintended tagging like this in
> the database?

We've encountered about 50 other occurrences of duplicate keys which are
likely caused by this bug. I have no estimate on how many keys in the
database have control characters without conflicting with another tag.

> Taginfo has a report about potentially problematic characters, but it
> does not seem to consider control characters.
> https://taginfo.openstreetmap.org/reports/characters_in_keys#problem

The OSMF taginfo instance uses the replication diffs, so is impacted by
the osmosis bug like any other diff consumer.

For next steps, I'll write this up for the osmosis issue tracker.


_______________________________________________
osmosis-dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/osmosis-dev
Reply | Threaded
Open this post in threaded view
|

Re: [osmosis-dev] Duplicate keys in replication diffs

Osmosis Development mailing list
In reply to this post by Osmosis Development mailing list
On 2020-09-16 4:22 p.m., Paul Norman wrote:
> In changeset 90486873 a node was introduced with the tags
>
>    <tag k="shelter_type" v="weather_shelter"/>
>    <tag
> k="shelter_type^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?"
> v="lean_to"/>
>
> where the ^? are the DEL character (U+0001). This was obviously not

I mixed up codepoints here and the DEL character is U+007F, not U+0001.
This doesn't change the rest of the email.


_______________________________________________
osmosis-dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/osmosis-dev