Data redundancy with "ref" tag on ways vs relations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
Hi all,

As part of the Poland remapping effort I have implemented a reporting
system called OSMonitor which analyzes road network in Poland in OSM
data and produces reports. Recently one user requested additional
validation - checking if ways in a relation for a specific road contain
proper "ref" tag values (where "proper" means that "ref" on ways
includes "ref" from the relation).

This is what came out of OSMonitor:

https://wiki.openstreetmap.org/w/index.php?title=OSMonitor/Poland_Major_Roads&oldid=791535

Note the error named "relation contains ways with wrong ref". So for
some roads the ways contain multiple variants of "ref" value. More -
"ref" tag for ways is out of sync with relation membership, see
http://www.openstreetmap.org/browse/way/172192711 (I am referring to the
version 2 of this way in case it has been fixed in the meantime) for
example.

So the question is - why does "ref" on way level make sense at all when
there is another (better and more flexible) way (pun intended) of doing
things?

Of course there is no hard rules in OSM concerning tagging but
http://wiki.openstreetmap.org/wiki/Key:ref does not say too much about
the problem above. I think it should describe why relations should be
used instead of "ref" tag on ways if possible.

I understand that software that consumes OSM data (renderers,
navigation) probably uses "ref" on ways but as you can see from the
report - it is useless for most of the roads (in Poland, don't know
about other countries) and relations contain more up-to-date information
so the software should use relations in the first place.

What do you think?

Paweł

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Peter Wendorff
Am 30.07.2012 18:22, schrieb Paweł Paprota:

> Hi all,
>
> As part of the Poland remapping effort I have implemented a reporting
> system called OSMonitor which analyzes road network in Poland in OSM
> data and produces reports. Recently one user requested additional
> validation - checking if ways in a relation for a specific road contain
> proper "ref" tag values (where "proper" means that "ref" on ways
> includes "ref" from the relation).
>
> This is what came out of OSMonitor:
>
> https://wiki.openstreetmap.org/w/index.php?title=OSMonitor/Poland_Major_Roads&oldid=791535
>
> Note the error named "relation contains ways with wrong ref". So for
> some roads the ways contain multiple variants of "ref" value. More -
> "ref" tag for ways is out of sync with relation membership, see
> http://www.openstreetmap.org/browse/way/172192711 (I am referring to the
> version 2 of this way in case it has been fixed in the meantime) for
> example.
>
> So the question is - why does "ref" on way level make sense at all when
> there is another (better and more flexible) way (pun intended) of doing
> things?
On the one hand it's easier to add for users than to maintain route
relations.
That in mind "allowing" this as one option enables even beginners to add
refs, too.

The other thing is that it's not more difficult to handle refs on single
ways for software than to pull these from relations as the relations
often are broken, too, so unconnected routes have to be handled with
both options - from single ways as well as from relations.

What makes relations easier to use for data consumers (not mappers) is
that it's defined which ways belong to the relations and therefore it
may be easier to "guess" missing links between unconnected parts.

I think, ref makes sense on both: relations and ways, as this allows
mappers to easily add a tag where it belongs to, even if it's not
possible to edit the relation - due to the usage situation online and a
restricted editor used, due to missing knowledge about route relations
or whatever.

On the other hand it allows to find possible errors by checking if
there's a conflict - like you do now.
A conflict may be, where a ref is on a way directly and on the relation
the way belongs to, and these refs mutually exclude each other.
This isn't always the case: a cycleway-ref may be correct in parallel to
a county street ref and so on; but sometimes it may in fact be an error,
and at least it's "not complete" in a sense that on the way a ref might
be missing, when it's on a relation where the way is a member.

regards
Peter

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
Hi Peter,

I understand what you're saying about ease of use but at the same time I
am very concerned about the quality of data - it is clear from reports
that there are just so many errors that the ref data is virtually
useless for navigation or location purposes.

I feel like there is no clear contract between the data and the
consuming software - some people use "ref" on ways, some people add
relations (this is preferred now as I see from remapping efforts). I see
two ways to "fix" it:

* Invest time in QA - like reporting, auto fixing bots etc. so that the
relations and refs on ways are synced.
* Choose one way (relations is clear "winner" here), invest time into
making consuming software support this way and clearly encourage it.

My feeling is that if there is no encouragement or "blueprint" for
tagging in this area then the data will always be a moving target and we
can endlessly do QA, fix it etc. And since the consuming software moves
slower than data (and maybe even slower than blueprints I guess?), the
data quality and end user experience for navigation, rendering is always
suffering.

Paweł

On Mon, Jul 30, 2012, at 18:35, Peter Wendorff wrote:

> Am 30.07.2012 18:22, schrieb Paweł Paprota:
> > Hi all,
> >
> > As part of the Poland remapping effort I have implemented a reporting
> > system called OSMonitor which analyzes road network in Poland in OSM
> > data and produces reports. Recently one user requested additional
> > validation - checking if ways in a relation for a specific road contain
> > proper "ref" tag values (where "proper" means that "ref" on ways
> > includes "ref" from the relation).
> >
> > This is what came out of OSMonitor:
> >
> > https://wiki.openstreetmap.org/w/index.php?title=OSMonitor/Poland_Major_Roads&oldid=791535
> >
> > Note the error named "relation contains ways with wrong ref". So for
> > some roads the ways contain multiple variants of "ref" value. More -
> > "ref" tag for ways is out of sync with relation membership, see
> > http://www.openstreetmap.org/browse/way/172192711 (I am referring to the
> > version 2 of this way in case it has been fixed in the meantime) for
> > example.
> >
> > So the question is - why does "ref" on way level make sense at all when
> > there is another (better and more flexible) way (pun intended) of doing
> > things?
> On the one hand it's easier to add for users than to maintain route
> relations.
> That in mind "allowing" this as one option enables even beginners to add
> refs, too.
>
> The other thing is that it's not more difficult to handle refs on single
> ways for software than to pull these from relations as the relations
> often are broken, too, so unconnected routes have to be handled with
> both options - from single ways as well as from relations.
>
> What makes relations easier to use for data consumers (not mappers) is
> that it's defined which ways belong to the relations and therefore it
> may be easier to "guess" missing links between unconnected parts.
>
> I think, ref makes sense on both: relations and ways, as this allows
> mappers to easily add a tag where it belongs to, even if it's not
> possible to edit the relation - due to the usage situation online and a
> restricted editor used, due to missing knowledge about route relations
> or whatever.
>
> On the other hand it allows to find possible errors by checking if
> there's a conflict - like you do now.
> A conflict may be, where a ref is on a way directly and on the relation
> the way belongs to, and these refs mutually exclude each other.
> This isn't always the case: a cycleway-ref may be correct in parallel to
> a county street ref and so on; but sometimes it may in fact be an error,
> and at least it's "not complete" in a sense that on the way a ref might
> be missing, when it's on a relation where the way is a member.
>
> regards
> Peter
>
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

voschix
From a practical point of view, I have always considered this a two stage approach.
My concern are cycle and walking routes, not too much the road network.

Especially for hiking networks, as a mapper you encounter the white and red labels, often with signposts and numbers, but you are unaware of the network (or you can see it from the map you have, but you cannot copy from it!). So you start out by putting the ref on the bits that you have walked. Gradually this grows into a more complete picture with bits from many mappers. At some point enough information has been accumulated so that someone can transfer the ref to a relation.

Volker

On 30 July 2012 18:58, Paweł Paprota <[hidden email]> wrote:
Hi Peter,

I understand what you're saying about ease of use but at the same time I
am very concerned about the quality of data - it is clear from reports
that there are just so many errors that the ref data is virtually
useless for navigation or location purposes.

I feel like there is no clear contract between the data and the
consuming software - some people use "ref" on ways, some people add
relations (this is preferred now as I see from remapping efforts). I see
two ways to "fix" it:

* Invest time in QA - like reporting, auto fixing bots etc. so that the
relations and refs on ways are synced.
* Choose one way (relations is clear "winner" here), invest time into
making consuming software support this way and clearly encourage it.

My feeling is that if there is no encouragement or "blueprint" for
tagging in this area then the data will always be a moving target and we
can endlessly do QA, fix it etc. And since the consuming software moves
slower than data (and maybe even slower than blueprints I guess?), the
data quality and end user experience for navigation, rendering is always
suffering.

Paweł

On Mon, Jul 30, 2012, at 18:35, Peter Wendorff wrote:
> Am 30.07.2012 18:22, schrieb Paweł Paprota:
> > Hi all,
> >
> > As part of the Poland remapping effort I have implemented a reporting
> > system called OSMonitor which analyzes road network in Poland in OSM
> > data and produces reports. Recently one user requested additional
> > validation - checking if ways in a relation for a specific road contain
> > proper "ref" tag values (where "proper" means that "ref" on ways
> > includes "ref" from the relation).
> >
> > This is what came out of OSMonitor:
> >
> > https://wiki.openstreetmap.org/w/index.php?title=OSMonitor/Poland_Major_Roads&oldid=791535
> >
> > Note the error named "relation contains ways with wrong ref". So for
> > some roads the ways contain multiple variants of "ref" value. More -
> > "ref" tag for ways is out of sync with relation membership, see
> > http://www.openstreetmap.org/browse/way/172192711 (I am referring to the
> > version 2 of this way in case it has been fixed in the meantime) for
> > example.
> >
> > So the question is - why does "ref" on way level make sense at all when
> > there is another (better and more flexible) way (pun intended) of doing
> > things?
> On the one hand it's easier to add for users than to maintain route
> relations.
> That in mind "allowing" this as one option enables even beginners to add
> refs, too.
>
> The other thing is that it's not more difficult to handle refs on single
> ways for software than to pull these from relations as the relations
> often are broken, too, so unconnected routes have to be handled with
> both options - from single ways as well as from relations.
>
> What makes relations easier to use for data consumers (not mappers) is
> that it's defined which ways belong to the relations and therefore it
> may be easier to "guess" missing links between unconnected parts.
>
> I think, ref makes sense on both: relations and ways, as this allows
> mappers to easily add a tag where it belongs to, even if it's not
> possible to edit the relation - due to the usage situation online and a
> restricted editor used, due to missing knowledge about route relations
> or whatever.
>
> On the other hand it allows to find possible errors by checking if
> there's a conflict - like you do now.
> A conflict may be, where a ref is on a way directly and on the relation
> the way belongs to, and these refs mutually exclude each other.
> This isn't always the case: a cycleway-ref may be correct in parallel to
> a county street ref and so on; but sometimes it may in fact be an error,
> and at least it's "not complete" in a sense that on the way a ref might
> be missing, when it's on a relation where the way is a member.
>
> regards
> Peter
>
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging


_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Peter Wendorff
In reply to this post by Paweł Paprota
Am 30.07.2012 18:58, schrieb Paweł Paprota:
> Hi Peter,
>
> I understand what you're saying about ease of use but at the same time I
> am very concerned about the quality of data - it is clear from reports
> that there are just so many errors that the ref data is virtually
> useless for navigation or location purposes.
But what leads you to the assumption that the data get's better when we
agree to only use ref on relations or only use ref on ways?

I think, this would lead to a situation where the error count doesn't
decrease, but the remaining errors aren't detectable any more.

Having refs only on relations means for a data consumer: I have to use
this data and I have no idea if it's correct - I have to assume it is to
use it.
Same for refs only on ways.

refs on both means: I am free to use this or that - that's not worse
than the two other options above; but on top of that I am able to check
if both taggings are in conflict, and if so, I e.g. may ask my users
what's correct here, and as osm is free for everyone, as long as that
one agrees to the contributor terms and license, it's very welcome that
errors are fixed or reported by these consumers or their users.
> I feel like there is no clear contract between the data and the
> consuming software - some people use "ref" on ways, some people add
> relations (this is preferred now as I see from remapping efforts). I see
> two ways to "fix" it:
>
> * Invest time in QA - like reporting, auto fixing bots etc. so that the
> relations and refs on ways are synced.
While fixing bots aren't a good approach here - you don't know for sure
if the relation or the way are correct - this is the way most of the
current QA tools work now: use heuristics or validity checks to guess
where errors might be, and most of these tools are welcome and (some)
mappers sometimes look into it to hunt down data errors.
> * Choose one way (relations is clear "winner" here), invest time into
> making consuming software support this way and clearly encourage it.
-1 as described above: we lose the possibility to check for clear bugs.

regards
Peter

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
In reply to this post by voschix
Hi Volker,

Great example. Based on what you wrote I think my point is strictly
about road network then. For (major) roads refs are well defined, easily
obtained and verified. People are creating whole roads in one sitting
based on Bing imagery and this is great - they add relation, insert ways
into that relation, give backward/forward roles for relation members if
needed - it works, the road becomes green in the report. So why put the
ref information again into ways themselves?

Paweł

On Mon, Jul 30, 2012, at 19:07, Volker Schmidt wrote:

> From a practical point of view, I have always considered this a two stage
> approach.
> My concern are cycle and walking routes, not too much the road network.
>
> Especially for hiking networks, as a mapper you encounter the white and
> red
> labels, often with signposts and numbers, but you are unaware of the
> network (or you can see it from the map you have, but you cannot copy
> from
> it!). So you start out by putting the ref on the bits that you have
> walked.
> Gradually this grows into a more complete picture with bits from many
> mappers. At some point enough information has been accumulated so that
> someone can transfer the ref to a relation.
>
> Volker
>
> On 30 July 2012 18:58, Paweł Paprota <[hidden email]> wrote:
>
> > Hi Peter,
> >
> > I understand what you're saying about ease of use but at the same time I
> > am very concerned about the quality of data - it is clear from reports
> > that there are just so many errors that the ref data is virtually
> > useless for navigation or location purposes.
> >
> > I feel like there is no clear contract between the data and the
> > consuming software - some people use "ref" on ways, some people add
> > relations (this is preferred now as I see from remapping efforts). I see
> > two ways to "fix" it:
> >
> > * Invest time in QA - like reporting, auto fixing bots etc. so that the
> > relations and refs on ways are synced.
> > * Choose one way (relations is clear "winner" here), invest time into
> > making consuming software support this way and clearly encourage it.
> >
> > My feeling is that if there is no encouragement or "blueprint" for
> > tagging in this area then the data will always be a moving target and we
> > can endlessly do QA, fix it etc. And since the consuming software moves
> > slower than data (and maybe even slower than blueprints I guess?), the
> > data quality and end user experience for navigation, rendering is always
> > suffering.
> >
> > Paweł
> >
> > On Mon, Jul 30, 2012, at 18:35, Peter Wendorff wrote:
> > > Am 30.07.2012 18:22, schrieb Paweł Paprota:
> > > > Hi all,
> > > >
> > > > As part of the Poland remapping effort I have implemented a reporting
> > > > system called OSMonitor which analyzes road network in Poland in OSM
> > > > data and produces reports. Recently one user requested additional
> > > > validation - checking if ways in a relation for a specific road contain
> > > > proper "ref" tag values (where "proper" means that "ref" on ways
> > > > includes "ref" from the relation).
> > > >
> > > > This is what came out of OSMonitor:
> > > >
> > > >
> > https://wiki.openstreetmap.org/w/index.php?title=OSMonitor/Poland_Major_Roads&oldid=791535
> > > >
> > > > Note the error named "relation contains ways with wrong ref". So for
> > > > some roads the ways contain multiple variants of "ref" value. More -
> > > > "ref" tag for ways is out of sync with relation membership, see
> > > > http://www.openstreetmap.org/browse/way/172192711 (I am referring to
> > the
> > > > version 2 of this way in case it has been fixed in the meantime) for
> > > > example.
> > > >
> > > > So the question is - why does "ref" on way level make sense at all when
> > > > there is another (better and more flexible) way (pun intended) of doing
> > > > things?
> > > On the one hand it's easier to add for users than to maintain route
> > > relations.
> > > That in mind "allowing" this as one option enables even beginners to add
> > > refs, too.
> > >
> > > The other thing is that it's not more difficult to handle refs on single
> > > ways for software than to pull these from relations as the relations
> > > often are broken, too, so unconnected routes have to be handled with
> > > both options - from single ways as well as from relations.
> > >
> > > What makes relations easier to use for data consumers (not mappers) is
> > > that it's defined which ways belong to the relations and therefore it
> > > may be easier to "guess" missing links between unconnected parts.
> > >
> > > I think, ref makes sense on both: relations and ways, as this allows
> > > mappers to easily add a tag where it belongs to, even if it's not
> > > possible to edit the relation - due to the usage situation online and a
> > > restricted editor used, due to missing knowledge about route relations
> > > or whatever.
> > >
> > > On the other hand it allows to find possible errors by checking if
> > > there's a conflict - like you do now.
> > > A conflict may be, where a ref is on a way directly and on the relation
> > > the way belongs to, and these refs mutually exclude each other.
> > > This isn't always the case: a cycleway-ref may be correct in parallel to
> > > a county street ref and so on; but sometimes it may in fact be an error,
> > > and at least it's "not complete" in a sense that on the way a ref might
> > > be missing, when it's on a relation where the way is a member.
> > >
> > > regards
> > > Peter
> > >
> > > _______________________________________________
> > > Tagging mailing list
> > > [hidden email]
> > > http://lists.openstreetmap.org/listinfo/tagging
> >
> > _______________________________________________
> > Tagging mailing list
> > [hidden email]
> > http://lists.openstreetmap.org/listinfo/tagging
> >
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
In reply to this post by Peter Wendorff
> But what leads you to the assumption that the data get's better when we 
> agree to only use ref on relations or only use ref on ways?

Well my logic is simple - less duplication = less data to maintain =
more time mappers can spend on checking the quality of the data.

If I understand your points correctly and follow the logic - introducing
"ref" on nodes would make users triple check the data and consuming
software would have a third source of ref data which would be better.

I think that data duplication is fundamentally wrong as it leads to
counterproductive work like fixing "ref" tags along the 1000 km motorway
which some users are doing right now based on OSMonitor reports...



> While fixing bots aren't a good approach here - you don't know for sure
> if the relation or the way are correct - this is the way most of the
> current QA tools work now: use heuristics or validity checks to guess
> where errors might be, and most of these tools are welcome and (some)
> mappers sometimes look into it to hunt down data errors.

This is exactly what OSMonitor is NOT trying to do. See introduction
part of https://wiki.openstreetmap.org/wiki/OSMonitor that explains what
I mean. The report does not just list relations from OSM - it list real
roads and based on that finds relation and verifies ways etc. So if the
road is red that means the OSM data is wrong (or OSMonitor has a bug but
of course it doesn't have bugs ;-).

Paweł

On Mon, Jul 30, 2012, at 19:12, Peter Wendorff wrote:
> Am 30.07.2012 18:58, schrieb Paweł Paprota:
> > Hi Peter,
> >
> > I understand what you're saying about ease of use but at the same time I
> > am very concerned about the quality of data - it is clear from reports
> > that there are just so many errors that the ref data is virtually
> > useless for navigation or location purposes.

>
> I think, this would lead to a situation where the error count doesn't
> decrease, but the remaining errors aren't detectable any more.
>
> Having refs only on relations means for a data consumer: I have to use
> this data and I have no idea if it's correct - I have to assume it is to
> use it.
> Same for refs only on ways.
>
> refs on both means: I am free to use this or that - that's not worse
> than the two other options above; but on top of that I am able to check
> if both taggings are in conflict, and if so, I e.g. may ask my users
> what's correct here, and as osm is free for everyone, as long as that
> one agrees to the contributor terms and license, it's very welcome that
> errors are fixed or reported by these consumers or their users.
> > I feel like there is no clear contract between the data and the
> > consuming software - some people use "ref" on ways, some people add
> > relations (this is preferred now as I see from remapping efforts). I see
> > two ways to "fix" it:
> >
> > * Invest time in QA - like reporting, auto fixing bots etc. so that the
> > relations and refs on ways are synced.

> > * Choose one way (relations is clear "winner" here), invest time into
> > making consuming software support this way and clearly encourage it.
> -1 as described above: we lose the possibility to check for clear bugs.
>
> regards
> Peter
>
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Apollinaris Schöll

On Jul 30, 2012, at 10:32 AM, Paweł Paprota wrote:

>> But what leads you to the assumption that the data get's better when we
>> agree to only use ref on relations or only use ref on ways?
>
> Well my logic is simple - less duplication = less data to maintain =
> more time mappers can spend on checking the quality of the data.
>

this logic is completely flawed. humans are not robots working on a list of problems to solve. As you learned from your experiment there is a inconsistency and now you can work to fix it. This is how osm works and it is great that you help to make it better. the more redundancy the more automated checks can be done to find errors. BUT … see below

> If I understand your points correctly and follow the logic - introducing
> "ref" on nodes would make users triple check the data and consuming
> software would have a third source of ref data which would be better.
>

no, that's again wrong logic.  there is a lot of gray between black and withe.
Humans will just stop being interested if you flood them with stupid cumbersome work. It's all about motivation and it's not possible to motivate by offering senseless challenges without any reward.



_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota


On Mon, Jul 30, 2012, at 19:44, Apollinaris Schoell wrote:
>
> this logic is completely flawed. humans are not robots working on a list
> of problems to solve. As you learned from your experiment there is a
> inconsistency and now you can work to fix it. This is how osm works and
> it is great that you help to make it better.


> the more redundancy the more
> automated checks can be done to find errors.
>

Sorry if I am being too harsh, I am not trying to be mean or anything
but... I don't understand how this sentence would be true in any
context. More redundancy, especially redundancy in data entered by
humans, simply invites more opportunity for errors. So of course QA
tools will find more errors - simply because there is more data to
maintain!

> > If I understand your points correctly and follow the logic - introducing
> > "ref" on nodes would make users triple check the data and consuming
> > software would have a third source of ref data which would be better.
> >
>
> no, that's again wrong logic.  there is a lot of gray between black and
> withe.
> Humans will just stop being interested if you flood them with stupid
> cumbersome work. It's all about motivation and it's not possible to
> motivate by offering senseless challenges without any reward.
>

I agree completely. And I'm not trying to ruin that by defining some
"tagging by committee" scheme etc. It's just about improving
documentation, guidelines, infrastructure to help people world more
efficiently on OSM data.

Paweł

>
>
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

"Petr Morávek [Xificurk]"
In reply to this post by Peter Wendorff
Hi Peter,

Peter Wendorff wrote:
> I think, this would lead to a situation where the error count doesn't
> decrease, but the remaining errors aren't detectable any more.
>
> Having refs only on relations means for a data consumer: I have to use
> this data and I have no idea if it's correct - I have to assume it is to
> use it.
> Same for refs only on ways.

This is a bit absurd argument. We should _not_ duplicate the data
between relations and their members just that we could cross-check them.
Almost any data duplication is wrong, because it's harder to keep the
data synchronized, and thus it leads to more errors.
If I create/modify a route relation, it's soo much fun to copy the ref
tags from the relation to ways. I'm a lazy person, so if someone tells
me that this is what I'm supposed to do, I'll just write script for
automating this enjoyable task (effectively canceling the benefit of
data duplication for cross-checks).

> refs on both means: I am free to use this or that

Wrong. You cannot use either, because as you wrote below - you don't
know for sure which of the values is correct.

Best regards,
Petr Morávek aka Xificurk



_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging

signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Tordanik
In reply to this post by Paweł Paprota
On 30.07.2012 20:08, Paweł Paprota wrote:
>> the more redundancy the more
>> automated checks can be done to find errors.
>
> Sorry if I am being too harsh, I am not trying to be mean or anything
> but... I don't understand how this sentence would be true in any
> context. More redundancy, especially redundancy in data entered by
> humans, simply invites more opportunity for errors. So of course QA
> tools will find more errors - simply because there is more data to
> maintain!

The reasoning is as follows:

If only one instance of the data is being created, then there is a
certain probability that this data is wrong.

If two instances are created at least somewhat independently*, then
there is a very small probability that both end up wrong, and a much
larger probability that one of them ends up wrong. The probability that
everything is correct is now smaller than before.

However, at this point we can begin to use automated error checking. The
idea is that errors that can be found automatically are much more
acceptable than those that cannot.

With only one instance of the data, none of the errors can found
automatically.

With two instances, most errors can be found automatically, only the
(very rare) case that both instances are wrong cannot.

Therefore, according to this line of reasoning, redundancy will increase
the number of errors initially, but reduce the number of "bad" errors
that cannot be spotted by automated checks.

* Of course this reasoning depends on the assumption that these two
instances are created independently. If a small number of mappers trace
an entire route network mostly from scratch (e.g. during initial
remapping) using aerial imagery or other non-local sources, this
assumption is probably not justified. However, it  might be justified
for a scenario where many contributors each add only small sections of a
route each over a longer period of time.

Tobias

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

David K

Route relations are good because they offer a structured format to identify and describe a route, such as US Bike Route 25, or Fairfield County Highway 177.  Ref tags on ways now are a good place to use shorthand, like USBR 25 or CR 177.  When multiple routes overlap, the ref tag on the way is an opportunity to summarize and/or prioritize.  For example, it would make sense to me for all the ways of Interstate 465 around Indianapolis to have their ref tag say just I-465, even though different parts of it may also be parts of one or two Interstates and up to six US and/or state routes.  If you're making a simple map of, say, a neighborhood or campus adjacent to that highway, the summary I-465 will suffice. If you're generating driving directions, I-465 alone will suffice and in this case matches exactly what a local will tell you.  If you're making a roadgeeky map that uses correct highway symbols wherever possible, you look at route relations and draw up to 9 shields on part of the highway.  If you're making a map that hilights one of those routes specifically, you use its relation which includes ways whose refs just say "I-465" which is okay because you're probably not labeling any roads besides hilighting the route of interest.

If you have a tool that says "US 136 includes ways whose ref tags say 'I-465' and this is an error" then you need to realize that route relations and way ref tags serve different purposes and use cases, or at least that sometimes it's impractical for them to match perfectly.


_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
David,

As I wrote - I am only producing reports for Poland and for my country
there are very few complex situations and it really is straightforward
to clearly see data duplication - having ref on ways has no value
because everything is expressed in relations.

>
> If you have a tool that says "US 136 includes ways whose ref tags say
> 'I-465' and this is an error" then you need to realize that route
> relations
> and way ref tags serve different purposes and use cases, or at least that
> sometimes it's impractical for them to match perfectly.

Like I wrote - a user requested this validation, I implemented it and
after running it on the data I immediately noticed the data duplication
problem and shared my thoughts here on the list because this use case of
trying to sync relations with way tags is a mistake in my opinion.

If you're saying that you (as in - mappers in the US) are using it for
different purposes, that's perfectly fine - I'm not proposing removing
all "ref" tags from ways. I'm just rising an issue with blindly copying
data over to different places - that is a road to nowhere.

Paweł

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

"Petr Morávek [Xificurk]"
In reply to this post by Tordanik
Tobias Knerr wrote:
> If two instances are created at least somewhat independently*

This is a really bold assumption. I'm having a hard time to imagine a
real-life scenario, where this is true.

On the other hand, I can imagine scenarios where the cross-check will
fail simply, because someone who edited way, forgot to edit the relation
as well and vice versa.

> However, at this point we can begin to use automated error checking. The
> idea is that errors that can be found automatically are much more
> acceptable than those that cannot.
>
> With only one instance of the data, none of the errors can found
> automatically.

You can spot a lot of errors just by doing a simple analysis of the
route graph - Are individual segments continuous? Is the resulting route
a simple linear feature? ...Yes, it's not 100% accurate, but the
alternative (data duplication + cross-checks) is neither.
By this you can catch most of the important errors and don't have to
rely on duplicated data.
I think it's better to spend some time in developing more sophisticated
QA tools, then to waste it on data duplication.

--
Actually, we have talked about this issue in talk-cz (Czech Republic)
recently. One guy made a simple analysis tool for finding "holes" in our
road network left by the redaction bot - the tools simply collected all
ways with e.g. highway=primary+ref=## and run some checks on them.
Consequently, the question why do we add the ref tag to every single way
was raised and that it would be a good idea to move it to some parent
relation. AFAIK, we don't use (m)any route relations in our road network
yet.

Best regards,
Petr Morávek


_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging

signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Apollinaris Schöll
In reply to this post by Paweł Paprota


On Mon, Jul 30, 2012 at 11:08 AM, Paweł Paprota <[hidden email]> wrote:



> the more redundancy the more
> automated checks can be done to find errors.
>

Sorry if I am being too harsh, I am not trying to be mean or anything
but... I don't understand how this sentence would be true in any
context. More redundancy, especially redundancy in data entered by
humans, simply invites more opportunity for errors. So of course QA
tools will find more errors - simply because there is more data to
maintain!


there are different types of errors and you focus on one only. I am not going to argue with examples or explanations. If you don't want to see it you won't see it.

The concept of OSM (and any crowd source project) is in no way similar to traditional knowledge collection. And for consumers it's even more different. It's a social project not a technical one. Errors are a key ingredient to keep the ball running. The number in the DB is not important. A data consumer has to decide what data is useful and try to make the best possible use. The number of errors remaining in the consumer application is what is important at the end.

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Peter Wendorff
In reply to this post by "Petr Morávek [Xificurk]"
Am 30.07.2012 20:11, schrieb "Petr Morávek [Xificurk]":

> Hi Peter,
>
> Peter Wendorff wrote:
>> I think, this would lead to a situation where the error count doesn't
>> decrease, but the remaining errors aren't detectable any more.
>>
>> Having refs only on relations means for a data consumer: I have to use
>> this data and I have no idea if it's correct - I have to assume it is to
>> use it.
>> Same for refs only on ways.
> This is a bit absurd argument. We should _not_ duplicate the data
> between relations and their members just that we could cross-check them.
> Almost any data duplication is wrong, because it's harder to keep the
> data synchronized, and thus it leads to more errors.
Yes, I realized that I was a little bit fuzzy here.
I'm not talking about data duplication in the meaning of "I add my data
twice in different ways", but about redundant (not duplicate) data in
the meaning of "Sven added his data there not nowing that it's possible
here too; I add the data here - and you can check if we both contributed
data that doesn't show failures."
> If I create/modify a route relation, it's soo much fun to copy the ref
> tags from the relation to ways. I'm a lazy person, so if someone tells
> me that this is what I'm supposed to do, I'll just write script for
> automating this enjoyable task (effectively canceling the benefit of
> data duplication for cross-checks).
No, you are not supposed to do - and I didn't say that.
In contrast that's exactly the way the heuristical approach to find
errors would be less productive if you did.
Nobody forbids to do that, sure; and I'm fine with anyone deciding to do
(as a mapper) only refs on ways or only refs on relations.
But I oppose to decide that one of these should not be done any more -
because I don't see the benefit in it.

If you create a route relation and add a ref there, that's fine. It's
correct (as long as you provide correct data of course), and it can be
used by data consumers.
If Emil draws his ways and adds a ref tag to it, that's fine too - it's
correct (...) and can be used by data consumers.
Neither you nor Emil did wrong stuff, and even if we afterwards have the
ref on both, that's fine - as explained before.

You (may) complain that now it's hard to "fix" a bug in it.
Sure: if the routes ref get's changed, anyone has to fix that both in
ways and in the relation probably; but if not, we have a contradiction
that at least can be found in QA tools; and this kind of doesn't happen
very often usually, so it's not that much work later.
>> refs on both means: I am free to use this or that
> Wrong. You cannot use either, because as you wrote below - you don't
> know for sure which of the values is correct.
I don't know wich is correct - I neither know if any of the values is
correct, but that's not different from before.
Now I know that contradicting values cannot both be correct - before I
didn't know and claimed them to be correct because of lacking alternatives.

What's better: to know that something is wrong, or to believe that
something is right as long as nobody claimed otherwise?
If we come to a point where most data is contradicting as soon as it's
mapped twice, we have a much bigger problem with OSM as a whole, because
then it's roulette everywhere our data should be used.

regards
Peter

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Jo-2
In reply to this post by Apollinaris Schöll

there are different types of errors and you focus on one only. I am not going to argue with examples or explanations. If you don't want to see it you won't see it.

I'll try to give another example, which may or may not help Pawel to see what you mean:

I'm gathering information about bus routes. When a mapper is in front of a bus stop, they can easily take note of all the lines serving this bus stop and adding this information to route_ref.

Now I come along and create a route for the itinerary of one of the routes. It helps me that I can find all the bus stops served by this line with a regular expression.

Should all the route_refs now be removed once the route relation is created? I don't think so, as they can easily be used to help verify (programmatically like you did) that my route relations remain correct. One could argue that this creates redundancy of data in the database, but this redundancy is what makes data validation possible.


For giggles I downloaded all ways with ref=N2 with an overpass query (http://overpass.osm.rambler.ru/query_form.html):

(
 way
  ["highway"]
  ["ref"="N2"];
 >;
);
out meta;

I should have added a bbox there to limit it to Belgium, but most of my Overpass queries are more specific.

I found that we don't seem to be using relations for N-roads, but we do for A-roads and E-roads (i.e. all the motorways). I also found that here in Belgium those roads seem to be 'interrupted' through the city centers, which would complicate checking them for continuity.

Polyglot



_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Paweł Paprota
As I wrote before in this thread in response to the hiking trail example
- this is great, I myself love mapping forest tracks for mountain biking
and stuff.

But maybe I should have made it clearer in the first place that I'm
talking about major roads - this stuff really should be in tip top shape
(relations created, no discrepancy with ref on ways/relations etc.) in
order for OSM to be considered for use in navigation. Now if I want to
drive across Europe I will get many different approaches - in Poland
there are these errors I mentioned, in Germany I have seen that they
have hierarchical relations and also separate relations for
backward/forward lanes. There are also for example Euro routes that e.g.
Polish roads/relations are not part of. I cannot imagine any navigation
or location software working with such data.

So my point here is very specific and maybe we misunderstand each other
because I failed to stress it enough. Ideally there would be QA tool
like OSMonitor running for every country in the world (or one instance
checking whole world) so that people could sync around the world and
create unified approach to the major road network tagging.

Paweł

On Mon, Jul 30, 2012, at 23:19, Jo wrote:

> > there are different types of errors and you focus on one only. I am not
> > going to argue with examples or explanations. If you don't want to see it
> > you won't see it.
> >
>
> I'll try to give another example, which may or may not help Pawel to see
> what you mean:
>
> I'm gathering information about bus routes. When a mapper is in front of
> a
> bus stop, they can easily take note of all the lines serving this bus
> stop
> and adding this information to route_ref.
>
> Now I come along and create a route for the itinerary of one of the
> routes.
> It helps me that I can find all the bus stops served by this line with a
> regular expression.
>
> Should all the route_refs now be removed once the route relation is
> created? I don't think so, as they can easily be used to help verify
> (programmatically like you did) that my route relations remain correct.
> One
> could argue that this creates redundancy of data in the database, but
> this
> redundancy is what makes data validation possible.
>
>
> For giggles I downloaded all ways with ref=N2 with an overpass query (
> http://overpass.osm.rambler.ru/query_form.html):
>
> (
>  way
>   ["highway"]
>   ["ref"="N2"];
>  >;
> );
> out meta;
>
> I should have added a bbox there to limit it to Belgium, but most of my
> Overpass queries are more specific.
>
> I found that we don't seem to be using relations for N-roads, but we do
> for
> A-roads and E-roads (i.e. all the motorways). I also found that here in
> Belgium those roads seem to be 'interrupted' through the city centers,
> which would complicate checking them for continuity.
>
> Polyglot
> _______________________________________________
> Tagging mailing list
> [hidden email]
> http://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

Frederik Ramm
Hi,

On 30.07.2012 23:41, Paweł Paprota wrote:
> But maybe I should have made it clearer in the first place that I'm
> talking about major roads - this stuff really should be in tip top shape
> (relations created, no discrepancy with ref on ways/relations etc.)

No. We only create relations when the ref tag is not sufficient. We
don't recommend that relations be created for roads otherwise, and
anyone doing anything with the data should not expect relations to be there.

> in Germany I have seen that they
> have hierarchical relations and also separate relations for
> backward/forward lanes.

These are created by individual mappers who are over-engineering things;
it is not something that "they" have in Germany as a general rule.

> So my point here is very specific and maybe we misunderstand each other
> because I failed to stress it enough. Ideally there would be QA tool
> like OSMonitor running for every country in the world (or one instance
> checking whole world) so that people could sync around the world and
> create unified approach to the major road network tagging.

A world-wide unified approach is difficult and does not have only
advantages; being forced to use the same approach in Chile and in China
might not be the best way to model reality. And since it only rarely
happens that someone travels from China to Chile using the same software
- maybe a unified approach isn't even required.

Bye
Frederik

--
Frederik Ramm  ##  eMail [hidden email]  ##  N49°00'09" E008°23'33"

_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Data redundancy with "ref" tag on ways vs relations

"Petr Morávek [Xificurk]"
In reply to this post by Peter Wendorff
Peter Wendorff wrote:
> I'm not talking about data duplication in the meaning of "I add my data
> twice in different ways", but about redundant (not duplicate) data in
> the meaning of "Sven added his data there not nowing that it's possible
> here too; I add the data here - and you can check if we both contributed
> data that doesn't show failures."

OK, but this all still rests on the assumption that there are in fact
two independent data sources. I really don't think this is happening in
real life.

There are basically 3 scenarios how you can get the ref tags out of sync:
1) Someone creates a relation with ref=42 and then add a way with
ref=24, why would he do it? Imho there are two possibilities:
   a) A mistake during editing - if the road really does not belong
there, then a QA tool analyzing roads, should find it relatively easy
(and such a tool would find e.g. a building polygon added to the route
relation as well <- THIS, you can't do with simple ref cross-check).
   b) It is correct and has some meaning, that I can't think of right
now. (simple ref cross-check fails again)

2) A relation exists with member ways without ref tag. This means that
the route is essentially mapped and any further editor is correcting
errors, that he found. Then someone comes and adds a ref tag to one of
the ways - why?
   a) He wanted to correct a wrong ref tag. Well, then I think that
person would/should look for the source of that wrong value (the
relation) and correct it. I think this scenario is highly unlikely.
   b) Same as 1b). (cross-check again fails)

3) Both relation and ways are populated with ref tags and someone who
wanted to correct a wrong value (e.g. because it's changed) edited only
one of them.


Could somebody provide a scenario where the data duplication and simple
way-relation cross-check of ref tags is really useful? So far, I can't
see one.

> If you create a route relation and add a ref there, that's fine. It's
> correct (as long as you provide correct data of course), and it can be
> used by data consumers.
> If Emil draws his ways and adds a ref tag to it, that's fine too - it's
> correct (...) and can be used by data consumers.
> Neither you nor Emil did wrong stuff, and even if we afterwards have the
> ref on both, that's fine - as explained before.

Oh, OK... Let me clarify my position as well: I do not propose some mass
edit that would wipe out one way of tagging in favor of the other right
now. But I do think, that we should reach some consensus about the
desired final state of things and encourage data producers/consumers to
converge on it.
E.g. as Volker Schmidt wrote (wrt hiking routes), it's OK to use ref tag
on ways, but it doesn't make much sense to keep it there once the
relation is created and maintained.

> You (may) complain that now it's hard to "fix" a bug in it.
> Sure: if the routes ref get's changed, anyone has to fix that both in
> ways and in the relation probably; but if not, we have a contradiction
> that at least can be found in QA tools;

And this contradiction is clearly a negative side effect of data
duplication, because without the duplication this bug would never occur.
Please note, that the duplication of ref tags on relation+ways will
never alert you about the ref change in real-world. So, in this use case
the data duplication has only negative effect on data quality.
Once you've found the no longer valid ref tag, in the case of duplicated
data you must change the relation and all the member ways, which is
error-prone boring task. On the other hand, if you keep the ref only on
the relation, it's an easy fix.

Best regards,
Petr Morávek aka Xificurk


_______________________________________________
Tagging mailing list
[hidden email]
http://lists.openstreetmap.org/listinfo/tagging

signature.asc (270 bytes) Download Attachment
1234