Wiki Edit War on using/avoiding semicolon lists

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
124 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Никита
> So the data in OSM is still "color=purple;orange;green"
This data will be untranslated for iD users. It is freetext without selectbox and prone to errors. You have to know tags when you use this approach.

http://wiki.openstreetmap.org/wiki/Key:mtb:scale:imba is not displayed directly as mtb:scale:imba=grade2. We have verbose strings for them:
(option #2 in selectbox, RU) "Лёгкая (зелёный круг)"

Instead of "trail_visibility"="good" you have string
(option #2 in selectbox, EN) "Good: markers visible, sometimes require searching"

You can actually translate keys right now in iD, no change in iD code:
color:purple=yes "фиолетовый"
color:orange=yes "оранжевый"
color:green=yes "зелёный"

They are just regular tags.

This easy solution is impossible with "color=purple;orange;green". I have no idea why there people who advocate "color=purple;orange;green" approach. 70+ messages, but only 1 argument it is easier to enter. Easier to enter for the person who knows everything and uses regexes every time - yes, or anybody else - no.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by Charles Basenga Kiyanda-4
On 22/01/2015, Charles Basenga Kiyanda <[hidden email]> wrote:
> I have to add fuel to a heated discussion, but in the whole exchange on
> whether or not semicolon lists should be allowed/used, the most obvious
> example (to me) that requires semicolon lists was not mentionned,
> namely: opening hours.

That's probably because opening_hours is arguably *not* a
multiple-values field, so it's not very interesting to bring it into
this discussion. A bit like seamark colors, providing only part of the
information is barely usefull (which indeed makes the idea of spliting
opening_hours into multiple keys silly).

Opening_hours is complex enough that it needs its own specific parser.
You can't treat it as a generic multiple-values field. It wouldn't
make any difference if the opening_hours spec was using '&' instead of
';' for example.


> Substituting
>
> opening_hours = Mo-We 08:00-17:00; Th-Fr 08:00-21:00
>
> to
>
> opening_hours:Mo-We 08:00-17:00 = yes
> opening_hours:Th-Fr 08:00-21:00 = yes
>
> would in my opinion lead to an inordinate number of subkeys.

Yes, that's definitely out. Using the key to convey multiple values is
only advisable if the value is standardised. As was said earlyer,
nobody is suggesting "name:Main Street=yes" either.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by dieterdreist
On 22/01/2015, Martin Koppenhoefer <[hidden email]> wrote:
> a minor issue with semicolon separated lists: we don't have yet defined how
> to escape actual semicolons in values.

To me, that is actually a major issue (putting blank fields in the same basket).

Defining how litteral semicolons and blank fields should be
represented isn't that hard. But making sure that consumers (let alone
editors) all follow whatever algorythm we'd end up choosing is damn
near impossible. Even if you wave your magik wand and convert all
programs today, tomorrow 10 new program will be writen that just uses
split(value,';') because that's the obvious implementation.

That impossibility is why I'm convinced that semicolons as the only
way to support multiple values is a very bad idea, despite being often
nicer to look at. They're fine to use for simple cases, but not for
anything complex or wide-ranging.

Contrast semicolons with the key_<number> scheme, which can safely be
implemented universally for all keys by a consumer, or the various
key:subkey schemes which can present more subtle information. And
contrary to semicolons, both those schemes downgrade gracefuly when
the consumer doesn't handle multiple-value schemes.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by Tod Fitch
On 22/01/2015, Tod Fitch <[hidden email]> wrote:
> With respect objects that have multiple values for a key, the arguments seem
> to come down to either:
>
> 1. key=value1;value2;. . . ,valueN
> 2. key:value1=yes + key:value2=yes + . . . + key:valueN=yes

3. key<indexseparator><index>=value

> As a programmer I can parse either set using any number of different
> methods.
>
> I am not against using a ":' in the key string to create name spaces and for
> grouping related keys. I think that is a very useful construct.
>
> But from a purely logical point of view, I'd say the second way misses the
> concept of "key=value" and is using "key:value" with a noise suffix of
> "=yes". Typically missing keys should be treated as having a value of either
> "no" or "unknown". Unless you can show me where key:value1="is something
> other than yes" then I may suspect you of putting values into the key field
> of the data.

You've given examples yourself where the value isn't "yes". The keys
addr1:housenumber and name_1 obviously don't have "yes" as a value.

Note also that nobody ever tagged "addr=42;Backer Street". It's not
"key:value" but "key:subkey".

> At present we have approached each case on an ad hoc basis. Sometimes using
> a number suffix by itself (addr2), sometimes preceded by a underscore
> (name_1) and sometimes by using a semicolon delimited list in the value
> field. By setting a simple convention for key with an array of values I
> think many of these cases could be handled in a simple, easy to remember
> unified manner.

Yes, I'd actually like to see this discussion happening. Seeing
"addr1" suggested when "name_1" is in use irks me (the separator isn't
the same). Another format that occasionally gets suggested is
"key[index]=value". And it might be a good idea to clarify the
interpretation with subkeys (is it "key_1:subkey" or "key:subkey_1"
?).

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by fly high
On 22/01/2015, fly <[hidden email]> wrote:
> Am 22.01.2015 um 21:32 schrieb Tod Fitch:
>> key:1=value1
>> key:2=value2
>> key:3=value3
>
> No not at all, this makes it worse. Numbers are way to general and you
> gain little.
>
> : is usualy used for subkeys so key1, key2 would even be better.

Subkeys are not always usable, the classic example being the name key.

Also, I think that the subkey separator (':') should be different from
the index separator (let's say '_' although that isn't fully
standardised yet). Because I could concoct an example where "2" is a
subkey rather than an index.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

althio
In reply to this post by Nadjita
While I am no skilled programmer I agree with the next points:
(any guru is welcome to disregard my following opinion if he wants to)


> Just because one can use a regular expression to grep out a certain meaning doesn't mean it's a good thing to do and will always work.

Regexps are AFAIK quite controversial because they are efficient at
some tasks but also can be hard to maintain -- especially if poorly
documented.

OSM is an open project for open data and we should strive not to
create unnecessary hurdles for access and use of this data.

OSM is not only for developers but also for experts in their fields
(but not computing/programming), students, local communities and any
citizen.

Regexps should not be used or misused as peer recognition or trial to
check whether someone is worthy to access all levels of data.
"It is easy for any good enough programmer": not a good argument in my book.


> [key:subkey=*] gives the flexibility to distinguish between equal and distinguished importance

I agree that it is more flexible, gives more freedom to sort and add details.
If I am not mistaken [key:subkey=*] can do everything as
[key=values;separated;by;semicolon] and more. The reverse is not true.


> [I too consider that] Using semicolon-lists for values [is] a crutch until a better tagging-scheme comes along.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

althio
In reply to this post by moltonel 3x Combo
> Also, I think that the subkey separator (':') should be different from
> the index separator (let's say '_' although that isn't fully
> standardised yet). Because I could concoct an example where "2" is a
> subkey rather than an index.

Visually for index I would go for "#" or "-" but I don't know if that
is acceptable regarding special characters status.

name=*
name#2=*
name#3=*

or

cuisine=*
cuisine-2=*
cuisine-3=*

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Никита
> the classic example being the name key.
This is bad example. We have many tags with their own semantic: http://wiki.openstreetmap.org/wiki/Names#Key_Variations We don't need name_1, name_2 or name#1 or name#2 keys.

> name=*
> name#2=*
> name#3=*
There no point in using indexes in key. You need semantic subkey: color, length, size, visibility. Not meaningless integers. Again, my example several messages earlier:

name=purple
name#2=orange
name#3=green

How do you query for green in overpass? In JOSM?

And what if for another object you will have different set of tags with different order?
name=black
name#2=green
name#3=white

Again name is bad example,

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

jgpacker
In reply to this post by althio
I don't understand the insistence in using regexes as some kind of argument against semicolon lists.

A semicolon list is an extremely simple pattern.
Such a pattern can be easily parsed even WITHOUT regexes.

Me and other developers in this thread (Imagic, Friedrich, David, Dmitry, Marc) are trying to tell you semicolons are not a problem.

Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Richard Welty-2
On 1/23/15 10:13 AM, jgpacker wrote:
I don't understand the insistence in using regexes as some kind of argument against semicolon lists.

A semicolon list is an extremely simple pattern.
Such a pattern can be easily parsed even WITHOUT regexes.

Me and other developers in this thread (Imagic, Friedrich, David, Dmitry, Marc) are trying to tell you semicolons are not a problem.

+1

competent languages provide simple mechanisms for splitting
strings on single characters. sometimes the function is even
called "split"

richard
-- 
[hidden email]
 Averill Park Networking - GIS & IT Consulting
 OpenStreetMap - PostgreSQL - Linux
 Java - Web Applications - Search

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Tod Fitch

On Jan 23, 2015, at 7:47 AM, Richard Welty wrote:

> On 1/23/15 10:13 AM, jgpacker wrote:
>> I don't understand the insistence in using regexes as some kind of argument against semicolon lists.
>>
>> A semicolon list is an extremely simple pattern.
>> Such a pattern can be easily parsed even WITHOUT regexes.
>>
>> Me and other developers in this thread (Imagic, Friedrich, David, Dmitry, Marc) are trying to tell you semicolons are not a problem.
>>
> +1
>
> competent languages provide simple mechanisms for splitting
> strings on single characters. sometimes the function is even
> called "split"
>
> richard

Yes, nearly every scripting language I've used has an easy way to split a string on a character or substring.

Is there is a value string that contains a semi-colon that is part of the actual value rather than a delimiter between values. I can't think of any but since for some key names the value field is free form I suppose it could happen. A semantic solution to that would be to document which keys may have (or maybe a shorter list of exceptions that cannot have) multiple values separated by semi-colons.

However there is the related question of how to deal with things like multiple addresses for one object, the subject of another current thread. In this case you probably don't want to be dealing with:

addr:housenumber=1234;7654
addr:street=Main Street;Elm Avenue

So you will be dealing with something like:

addr:housenumber=1234
addr:street=Main Street
addr:housenumber_1=7654
addr:street_1=Elm Avenue

Coming up with a uniform way of dealing with arrays of values would mean that a simple and consistent solution could be used for both problems.

I don't much care if the syntax of the key is "key:1", "key_1", "key#1" or "key[1]" but I do think that something needs to be picked for sets of keys that have related values. And once you do that the solution could be applied as an alternative to semi-colon delimited values in the case being discussed here.

Having one approach that solves two issues seems better to me than having two solutions. Yes, any robust data consumer software will have to deal with all the existing ways things are done now. But standardizing on way to go forward should help in the future.

Cheers,
Tod Fitch


_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Richard Welty-2
i've removed prior discussion so that this can stand on its own.

i admit that the distinction between keys and values is a bit
blurry; it would be a fallacy to claim that data goes only in
values because that's obviously not completely true.

however, i will assert that for key space to be useful it needs
to be managed; pushing to much arbitrary data into the key
space reduces its utility.

from this point of view, having colour in key space makes sense
but having the actual names of colours as subkeys seems to me
to be overloading too much data value into the key side. for
every parsing problem you simplify on the value side by flipping
data into subkeys, you create additional complexity when data
consumers must navigate key space.

what we're doing now is not necessarily ideal, the fact that
we're having this discussion shows this. however, moving
a bunch of data data into key space to avoid semicolons
does not strike me as an improvement.

richard

--
[hidden email]
  Averill Park Networking - GIS & IT Consulting
  OpenStreetMap - PostgreSQL - Linux
  Java - Web Applications - Search


_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Colin Smale
In reply to this post by Tod Fitch

Tag namespaces already provide a kind of "data structure" facility. IMHO a syntax that is close to the traditional way of representing vectors of structures would be something like this:

addr[1]:housenumber=1234
addr[1]:street=Main Street
addr[2]:housenumber=7654
addr[2]:street=Elm Avenue

All house numbers are called "housenumber", addr[1] and addr[2] are both instances of an address.

In fact, if the ":" is replaced by a ".", it starts to look very familiar....

Is the maximum length of a value still 255 characters (or is it bytes?)? With the ";" syntax we could easily come up against that limit, whereas an array / key-based syntax would allow 255 for each individual value.

Obviously (at least IMHO) the data model of OSM would benefit from having a defined method to represent higher-level constructs. Some people are already talking about having an "area" or a "polygon" distinct from a "way" with start=end. Why not have a proper discussion about how to represent lists of values? Of course it helps to have some examples in mind, but let's step back and find a more generic solution which will also address our current problem.

I really don't think the fact that some people don't understand regular expressions is a good reason to not look to the future. Once a standard is defined, the software will soon catch up - if the standard is well-specified. If the standard is not well-specified, poorly documented, too many exceptions etc then it will be "ignored".

Colin

 

On 2015-01-23 17:29, Tod Fitch wrote:

On Jan 23, 2015, at 7:47 AM, Richard Welty wrote:
On 1/23/15 10:13 AM, jgpacker wrote:
I don't understand the insistence in using regexes as some kind of argument against semicolon lists. A semicolon list is an extremely simple pattern. Such a pattern can be easily parsed even WITHOUT regexes. Me and other developers in this thread (Imagic, Friedrich, David, Dmitry, Marc) are trying to tell you semicolons are not a problem.
+1 competent languages provide simple mechanisms for splitting strings on single characters. sometimes the function is even called "split" richard
Yes, nearly every scripting language I've used has an easy way to split a string on a character or substring.

Is there is a value string that contains a semi-colon that is part of the actual value rather than a delimiter between values. I can't think of any but since for some key names the value field is free form I suppose it could happen. A semantic solution to that would be to document which keys may have (or maybe a shorter list of exceptions that cannot have) multiple values separated by semi-colons.

However there is the related question of how to deal with things like multiple addresses for one object, the subject of another current thread. In this case you probably don't want to be dealing with:

addr:housenumber=1234;7654
addr:street=Main Street;Elm Avenue

So you will be dealing with something like:

addr:housenumber=1234
addr:street=Main Street
addr:housenumber_1=7654
addr:street_1=Elm Avenue

Coming up with a uniform way of dealing with arrays of values would mean that a simple and consistent solution could be used for both problems.

I don't much care if the syntax of the key is "key:1", "key_1", "key#1" or "key[1]" but I do think that something needs to be picked for sets of keys that have related values. And once you do that the solution could be applied as an alternative to semi-colon delimited values in the case being discussed here.

Having one approach that solves two issues seems better to me than having two solutions. Yes, any robust data consumer software will have to deal with all the existing ways things are done now. But standardizing on way to go forward should help in the future.

Cheers,
Tod Fitch


_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by althio
On 23/01/2015, althio <[hidden email]> wrote:
> Visually for index I would go for "#" or "-" but I don't know if that
> is acceptable regarding special characters status.
>
> name=*
> name#2=*
> name#3=*

I really like using '#' as the index separator. It is sometimes
pronounced "number". It hasn't been used before in osm keys (AFAIK),
which is both a blessing (no clash with an existing definition) and a
curse (need to convince everybody to start using that).

'_', '-', and '' (empty string) have the drawback of being mistakable
for something else.

'[]' may appeal to some (it's the same syntax as many programming
languages), but it feels a bit verbose and it can be a pain for
regexps.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
In reply to this post by Никита
On 23/01/2015, Никита <[hidden email]> wrote:
>> the classic example being the name key.
> This is bad example. We have many tags with their own semantic:
> http://wiki.openstreetmap.org/wiki/Names#Key_Variations We don't need
> name_1, name_2 or name#1 or name#2 keys.

Of course when you can figure out names that are semantically
different, you use the specific tag. But it's not rare that a place
has two names that cannot be differentiated by semantic or popularity.
In those cases you have alt_name if you're lucky enough to only need
one extra name, and name_<number> if you need more values.

> There no point in using indexes in key. You need semantic subkey:
> color, length, size, visibility. Not meaningless integers. Again, my
> example several messages earlier:

Indexes and subkeys are two different usecases. Both are useful.

> name=purple
> name#2=orange
> name#3=green
>
> How do you query for green in overpass? In JOSM?

josm: name(#\d+)?=green
overpass: I don't know it enough

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

moltonel 3x Combo
On 23/01/2015, moltonel 3x Combo <[hidden email]> wrote:
>> name=purple
>> name#2=orange
>> name#3=green
>>
>> How do you query for green in overpass? In JOSM?
>
> josm: name(#\d+)?=green
> overpass: I don't know it enough

Note that if "key#index=value" becomes commonly used, tools like josm
and overpass (and nominatim and and and...) will eventually integrate
it into their engine, so that searching for "key" will also
automatically find "key(#\d+)?".

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Marc Gemis
In reply to this post by moltonel 3x Combo

On Fri, Jan 23, 2015 at 6:25 PM, moltonel 3x Combo <[hidden email]> wrote:
> How do you query for green in overpass? In JOSM?

josm: name(#\d+)?=green
overpass: I don't know it enough

is node[~"^name#.*$"~"^Green$"]; close enough? I'm not sure which regular expressions JOSM supports

m

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Jo-2
It supports the 'java' flavour. It took me a while to figure out how to work with those ;-delimited strings, but once I found this regex worked for my purposes:

route_ref="(^|.+;)26(;.+|$)" inview odbl=new

I never looked back. As for Nikita's question about how to find an item which is not in the semi-colon delimited list, the answer is: you don't find it. That object is not part of the result set. Maybe he meant how to find out that an item is missing from the list? Well I don't see how that becomes any easier by moving the values over to the keys. And apparently coming up with regexes that can work with that, is even more 'complex'.

Anyway, you don't have to be a programmer to find such solutions with 'complicated' regexes. Just do a Google search and you'll probably stumble upon some wiki page I created where I documented the whole process.

Cheers,

Polyglot

PS: deleted the stuff under the 3 dots, thereby almost losing the message, as backspace seems to instruct Firefox to go back one page.



_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Никита
That object is not part of the result set. Maybe he meant how to find out that an item is missing from the list? Well I don't see how that becomes any easier by moving the values over to the keys.
"color:green"!=* in overpass should return values without information about green color or "color:green"="no" will return objects without green color

And apparently coming up with regexes that can work with that, is even more 'complex'.
It is not complex. It is impossible to write presets or translations for iD or JOSM using name#2=green approach.

To all regex advocates, your knowledge of regexes is irrelevant to how OSM functions. I wait for your solution how we should support name=, name#2= name#3= in presets or translations.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
Reply | Threaded
Open this post in threaded view
|

Re: Wiki Edit War on using/avoiding semicolon lists

Marc Gemis

On Sat, Jan 24, 2015 at 6:40 AM, Никита <[hidden email]> wrote:
Well I don't see how that becomes any easier by moving the values over to the keys.
"color:green"!=* in overpass should return values without information about green color or "color:green"="no" will return objects without green color

But how can you find all green, lightgreen, bluegreen, etc. values (aka all "greenish colors) in your approach ?

m.

_______________________________________________
Tagging mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/tagging
1234567