Querying for non-native characters in name field

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Querying for non-native characters in name field

Jóhannes Birgir Jensson
One side effect of Maps.me is the editing of name= field (this has been
mostly fixed in the latest versions though, most entries now go into
name:phone language).

I want to be able to do an overpass query for Iceland where name= field
contains non-Icelandic characters. These could be for example Chinese,
Cyrillic or even other European characters (such as âà for example). I'm
guessing it could be difficult for the latin characters but hopeful it
would be easier for non-latin alphabets.

Is there a magic formula for achieving this?

--Stalfur / Jói


_______________________________________________
dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Querying for non-native characters in name field

Ilya Zverev-2
Try this one: http://overpass-turbo.eu/s/lAf

Though I'm not sure why it doesn't catch all the weird Icelandic
characters. See this post for better option, although in development:
http://www.openstreetmap.org/user/mmd/diary/40197

IZ

29.01.2017 20:43, Jóhannes Birgir Jensson пишет:

> One side effect of Maps.me is the editing of name= field (this has been
> mostly fixed in the latest versions though, most entries now go into
> name:phone language).
>
> I want to be able to do an overpass query for Iceland where name= field
> contains non-Icelandic characters. These could be for example Chinese,
> Cyrillic or even other European characters (such as âà for example). I'm
> guessing it could be difficult for the latin characters but hopeful it
> would be easier for non-latin alphabets.
>
> Is there a magic formula for achieving this?
>
> --Stalfur / Jói
>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.openstreetmap.org/listinfo/dev
>


_______________________________________________
dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Querying for non-native characters in name field

Alexander Matheisen
Am Sonntag, den 29.01.2017, 21:07 +0300 schrieb Ilya Zverev:
> Try this one: http://overpass-turbo.eu/s/lAf
>
> Though I'm not sure why it doesn't catch all the weird Icelandic 
> characters. See this post for better option, although in
> development: 
> http://www.openstreetmap.org/user/mmd/diary/40197

Maybe it would be useful to integrate such checks into JOSM. A new
feature of JOSM is support of territory selectors for validator rules (
https://josm.openstreetmap.de/wiki/Help/Styles/MapCSSImplementation#Ter
ritoryselector), which makes it very easy to implement such checks on
non-native characters.

Here in Germany these edits by Maps.me users are also a problem. Not
just name=*, but also name:de=* is used for non-german names, typically
done by asian tourists or arab migrants. Examples: http://www.openstree
tmap.org/node/4148553590 or http://www.openstreetmap.org/node/369353103
3/history.


Regards
Alex
_______________________________________________
dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/dev

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Querying for non-native characters in name field

Roland Olbricht
In reply to this post by Jóhannes Birgir Jensson
> I want to be able to do an overpass query for Iceland where name= field
> contains non-Icelandic characters. These could be for example Chinese,
> Cyrillic or even other European characters (such as âà for example). I'm
> guessing it could be difficult for the latin characters but hopeful it
> would be easier for non-latin alphabets.
>
> Is there a magic formula for achieving this?

I suggest, as a refinement of Ilya's query, this one:
http://overpass-turbo.eu/s/lCk

As it may help for other languages, I explain how I got to this:

1. Start with

area["name:en"="Iceland"];
node(area)[name];
out count;

This is basically an all-nodes-in-Iceland-with a name. The important
part is the "out count". This assures that you are not flooded with
results. For the same reason it is enough to start with nodes: We do not
want a final result now. But we want to create a senstive search term.
For this reason, we will even get down to just a subset of all nodes in
a second.

2. Clamp down to

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out count;

These are all nodes that contain at least one character different from a
latin letter. These are still many. Therefore:

3. Get examples with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out 100;

This prints some random 100 results (in fact: the 100 matches with
lowest node id). Now we can look at the name fields and get an idea what
we would like to exclude in addition.

4. Start to narrow down with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z0-9 ]"];
out 100;

Spaces and digits are OK even before we start to accept all the special
characters from Icelandic.

This process is now repeated until the sample contains no more false
positives. Finally, we expand this to all three types of OSM elements,
in the expectation that not much false positives appear.

Cheers,

Roland


_______________________________________________
dev mailing list
[hidden email]
https://lists.openstreetmap.org/listinfo/dev