Meaning of option --x-mdr7-excl

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Meaning of option --x-mdr7-excl

Gerd Petermann
Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index  will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index
The road name "Chemin de Pierre Froide" is added to the index as
"Chemin de Pierre Froide"
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide"
"Pierre Froide"
"Froide"
Now, would you expect a change  if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
And what would you expect with  --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude
"de Pierre Froide"

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x %  .

Comments?

Gerd
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

popej
Hi Gerd,

I think the most useful version would be to drop last word from name.

If you would like to process prefixes too, then maybe limit it to first
word only. For name like "Chemin de Pierre Froide", you would remove
"Chemin" but not "de". Alternatively you could implement prefixes with
multiple words, like for example:

--x-mdr7-excl="chemin,chemin de"

--
Best regards,
Andrzej
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

thesurveyor
In reply to this post by Gerd Petermann

Hi  Gerd,

at first: many thanks for your work. I'm typically just reading your posts and using mkgmap with a lot of fun.

For this topic I will try to explain my thougths, but maybe I didn't understood all those options

- the original name should always be in the index, because that's it what everybody knows and expects to find
- with --x-mdr7-excl=X  we should avoid the entry "X" in the index

for example:

Using --x-mdr7-excl=Road,Street,Chenin,des,de in combination with --x-split-name-index
should insert into index, for
name="ABC Straße"  -  in index: "ABC Straße", but not "Straße"
name="Straße des 17. Juni" - in index "Straße des 17. Juni", "des 17. Juni", "17. Juni", "Juni"
name="Chenin de Pierre Froide" - in index "Chenin de Pierre Froide", "de Pierre Froide", "Pierre Froide", "Froide"

We should think like the user of the map. He will not necessarily know about splitting street names. He just likes to find the name. And he knows the name or at least a part of it. I agree that nobody will search for "des 17. Juni", everybody will search for the complete name "Straße des 17. Juni" or "17. Juni". But this is nearly impossible to describe which part we need and which not.

 
Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.
I wouldn't do that.
 
I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .
Oh that idea is good, but I'm not sure if it will work. Think about a map like a map of the Alps. This map covers a lot of different countries with different languages.  I haven't checked it, but I assume that the most streets will be in german speaking countries (Germany, Austria, Switzerland). How should we compute a value for appearing of strings for France?  Or another example: "Straße" is very common in Germany, but if I have a map with whole France and a small part of Germany, then "Straße" would just get a low value and therefore gets included. So if we would do something like that, then we need a value for each country. 
And think about the street names, which nearly every town in Germany has, like "Hauptstraße".
So, I wouldn't do that.
 

Overall: I would prefer an easy to understand rule.

best regards,
Gert

 
 
Gesendet: Dienstag, 04. April 2017 um 17:13 Uhr
Von: "Gerd Petermann" <[hidden email]>
An: "[hidden email]" <[hidden email]>
Betreff: [mkgmap-dev] Meaning of option --x-mdr7-excl
Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index
The road name "Chemin de Pierre Froide" is added to the index as
"Chemin de Pierre Froide"
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide"
"Pierre Froide"
"Froide"
Now, would you expect a change if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
And what would you expect with --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude
"de Pierre Froide"

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .

Comments?

Gerd
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
 
 

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Felix Hartmann-2
this option would be good use also without split-names option. I append stuff like path or ft or M4/5 for information of the OSM tags into ways - there is no reason to show it in the address search. I could easily exclude this also right now I'm not fully sold on split--names option for my maps (might use it - have to experiment - until now with appending the street type into all names it was not feasible anyhow).

on the other hand I would actually prefer a different approach - a key like mkgmap:searchname mkgmap:searchname1 and so on - as I would love for all indexes to be searchable for name and name:en. E.g. I want a map of europe to return Bruxelles and Brussels in the POI search - and if possible also in address search. (or Munich and München and so on). This could be achieved by setting name - name:en - but different keys in style would actually profit even more. split-names could still be used - for splitting up. But main thing would be to be able to search for english and local names in all maps. POI - it is possible right now by simply duplicating the POI with continue for address search it is mainly about city name - not sure if it's possible or not.

On 4 April 2017 at 22:01, <[hidden email]> wrote:

Hi  Gerd,

at first: many thanks for your work. I'm typically just reading your posts and using mkgmap with a lot of fun.

For this topic I will try to explain my thougths, but maybe I didn't understood all those options

- the original name should always be in the index, because that's it what everybody knows and expects to find
- with --x-mdr7-excl=X  we should avoid the entry "X" in the index

for example:

Using --x-mdr7-excl=Road,Street,Chenin,des,de in combination with --x-split-name-index
should insert into index, for
name="ABC Straße"  -  in index: "ABC Straße", but not "Straße"
name="Straße des 17. Juni" - in index "Straße des 17. Juni", "des 17. Juni", "17. Juni", "Juni"
name="Chenin de Pierre Froide" - in index "Chenin de Pierre Froide", "de Pierre Froide", "Pierre Froide", "Froide"

We should think like the user of the map. He will not necessarily know about splitting street names. He just likes to find the name. And he knows the name or at least a part of it. I agree that nobody will search for "des 17. Juni", everybody will search for the complete name "Straße des 17. Juni" or "17. Juni". But this is nearly impossible to describe which part we need and which not.

 
Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.
I wouldn't do that.
 
I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .
Oh that idea is good, but I'm not sure if it will work. Think about a map like a map of the Alps. This map covers a lot of different countries with different languages.  I haven't checked it, but I assume that the most streets will be in german speaking countries (Germany, Austria, Switzerland). How should we compute a value for appearing of strings for France?  Or another example: "Straße" is very common in Germany, but if I have a map with whole France and a small part of Germany, then "Straße" would just get a low value and therefore gets included. So if we would do something like that, then we need a value for each country. 
And think about the street names, which nearly every town in Germany has, like "Hauptstraße".
So, I wouldn't do that.
 

Overall: I would prefer an easy to understand rule.

best regards,
Gert

 
 
Gesendet: Dienstag, 04. April 2017 um 17:13 Uhr
Von: "Gerd Petermann" <[hidden email]>
An: "[hidden email]" <[hidden email]>
Betreff: [mkgmap-dev] Meaning of option --x-mdr7-excl
Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index
The road name "Chemin de Pierre Froide" is added to the index as
"Chemin de Pierre Froide"
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide"
"Pierre Froide"
"Froide"
Now, would you expect a change if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
And what would you expect with --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude
"de Pierre Froide"

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .

Comments?

Gerd
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
 
 

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev



--
Felix Hartman - Openmtbmap.org & VeloMap.org
Schusterbergweg 32/8
6020 Innsbruck
Austria - Österreich

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Felix Hartmann-2
oh could there be a test if name1 = name2 for the split names option? Just in case you use options like set name= name - name:en - should it happen it's the same - of course only one shoud be put into index. Or does this check exist already? Yeah and are symbold like "-" already excluded by default?

On 4 April 2017 at 23:23, Felix Hartmann <[hidden email]> wrote:
this option would be good use also without split-names option. I append stuff like path or ft or M4/5 for information of the OSM tags into ways - there is no reason to show it in the address search. I could easily exclude this also right now I'm not fully sold on split--names option for my maps (might use it - have to experiment - until now with appending the street type into all names it was not feasible anyhow).

on the other hand I would actually prefer a different approach - a key like mkgmap:searchname mkgmap:searchname1 and so on - as I would love for all indexes to be searchable for name and name:en. E.g. I want a map of europe to return Bruxelles and Brussels in the POI search - and if possible also in address search. (or Munich and München and so on). This could be achieved by setting name - name:en - but different keys in style would actually profit even more. split-names could still be used - for splitting up. But main thing would be to be able to search for english and local names in all maps. POI - it is possible right now by simply duplicating the POI with continue for address search it is mainly about city name - not sure if it's possible or not.

On 4 April 2017 at 22:01, <[hidden email]> wrote:

Hi  Gerd,

at first: many thanks for your work. I'm typically just reading your posts and using mkgmap with a lot of fun.

For this topic I will try to explain my thougths, but maybe I didn't understood all those options

- the original name should always be in the index, because that's it what everybody knows and expects to find
- with --x-mdr7-excl=X  we should avoid the entry "X" in the index

for example:

Using --x-mdr7-excl=Road,Street,Chenin,des,de in combination with --x-split-name-index
should insert into index, for
name="ABC Straße"  -  in index: "ABC Straße", but not "Straße"
name="Straße des 17. Juni" - in index "Straße des 17. Juni", "des 17. Juni", "17. Juni", "Juni"
name="Chenin de Pierre Froide" - in index "Chenin de Pierre Froide", "de Pierre Froide", "Pierre Froide", "Froide"

We should think like the user of the map. He will not necessarily know about splitting street names. He just likes to find the name. And he knows the name or at least a part of it. I agree that nobody will search for "des 17. Juni", everybody will search for the complete name "Straße des 17. Juni" or "17. Juni". But this is nearly impossible to describe which part we need and which not.

 
Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.
I wouldn't do that.
 
I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .
Oh that idea is good, but I'm not sure if it will work. Think about a map like a map of the Alps. This map covers a lot of different countries with different languages.  I haven't checked it, but I assume that the most streets will be in german speaking countries (Germany, Austria, Switzerland). How should we compute a value for appearing of strings for France?  Or another example: "Straße" is very common in Germany, but if I have a map with whole France and a small part of Germany, then "Straße" would just get a low value and therefore gets included. So if we would do something like that, then we need a value for each country. 
And think about the street names, which nearly every town in Germany has, like "Hauptstraße".
So, I wouldn't do that.
 

Overall: I would prefer an easy to understand rule.

best regards,
Gert

 
 
Gesendet: Dienstag, 04. April 2017 um 17:13 Uhr
Von: "Gerd Petermann" <[hidden email]>
An: "[hidden email]" <[hidden email]>
Betreff: [mkgmap-dev] Meaning of option --x-mdr7-excl
Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index
The road name "Chemin de Pierre Froide" is added to the index as
"Chemin de Pierre Froide"
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide"
"Pierre Froide"
"Froide"
Now, would you expect a change if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
And what would you expect with --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude
"de Pierre Froide"

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .

Comments?

Gerd
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
 
 

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev



--
Felix Hartman - Openmtbmap.org & VeloMap.org
Schusterbergweg 32/8
6020 Innsbruck
Austria - Österreich



--
Felix Hartman - Openmtbmap.org & VeloMap.org
Schusterbergweg 32/8
6020 Innsbruck
Austria - Österreich

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Carlos Dávila-2
In reply to this post by Gerd Petermann
El 04/04/17 a las 17:13, Gerd Petermann escribió:

> Hi all,
>
> I did not yet document this option because I don't think that it is useful as it is implemented now.
> I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
> Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index  will work fine.
>
> A different picture is a frensh country.
> Let's look at an example. Assume you have options --index and --x-split-name-index
> The road name "Chemin de Pierre Froide" is added to the index as
> "Chemin de Pierre Froide"
> and because of --x-split-name-index the following extries are also added:
> "de Pierre Froide"
> "Pierre Froide"
> "Froide"
> Now, would you expect a change  if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
> And what would you expect with  --x-mdr7-excl=de,du,la ?
>
> With the current implementation there would be no change in output, because the
> the check works in this way:
> Build the string that should be added to the index
> Check if that string is in the exclude list, if not, add it to the index.
>
> I might change that like this:
> Build the string that should be added to the index
> Check if the first word in that string is in the exclude list, if not, add it to the index.
>
> With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
> would exclude the entry
> "Chemin de Pierre Froide"
Would "Chemin de Pierre Froide" still be added by --index option?
> and --x-mdr7-excl=de,du,la would exclude
> "de Pierre Froide"
>
> Another option would be to allow regular expressions in the exclude list, but that would require more
> effort (input file instead of single option) and probably much more run time.
I think creating an input file for this wouldn't be a problem. I
currently use one with over 1000 lines to manage road names for
different "French style" languages.
>
> I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
> only those are generated which do not appear more than x %  .
>
> Comments?
>
> Gerd

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Carlos Dávila-2
In reply to this post by popej
El 04/04/17 a las 20:34, Andrzej Popowski escribió:

> Hi Gerd,
>
> I think the most useful version would be to drop last word from name.
>
> If you would like to process prefixes too, then maybe limit it to
> first word only. For name like "Chemin de Pierre Froide", you would
> remove "Chemin" but not "de". Alternatively you could implement
> prefixes with multiple words, like for example:
>
> --x-mdr7-excl="chemin,chemin de"
>

I don't find much use in removing only the first word. The probability
that someone searches "Chemin de Pierre Froide" typing "de Pierre
Froide" is very low. Using multiple words would be much more useful, but
the list may be really large.


_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Gerd Petermann
Hi all,

thanks for the input.  I think we have at least two different issues here.
1) Some styles add attributes like "Path", "unpaved" etc. to the road name. Those attributes are obsolete in the index, you don't want to search for them and they should not appear in result lists.
If that is true we should try to find a method to remove all those attributes from the name before it is written to the index. I can think of various ways to do that:
1a) While name is not empty: check if last word is in exclude list, if yes, remove it, if not, exit loop.
1b) For each word in name: check if it appears in the exclude list, if so, remove it including the rest.
1c) Use a special character sequence (like the highway shield codes) to separate the real name from the rest
If the name is empty after such processing it would not be added to the index.
1a) and 1b) may need special code for unicode maps when name is written from right to left (hebrew for example). No idea if mkgmap works at all for those names.

2) The other problem is that road names in some languages have lots of rather meaningless words, either as prefix (Rua, Rue, Chemin) or as suffix (Street, Road, Weg).
My understanding is that we don't want to remove those words from the name. The index should at least contain the full name, but we don't all the variations produced by the current
implementation of the --x-split-name-index option.

3) A road has up to four different labels, the processing happens for each label. In some cases the --housenumber option adds or overwrites a label
to make sure that an address is found. The default style creates quite often three labels like this:
label1: "Ahlhorner Straße [B 213]"
label2: "B 213"
label3: "Ahlhorner Straße"
I am not sure if we really need all the index entries which are now produced for this.

4) @Felix: Your ideas to put names of POI in different languages into the index are a bit off-topic as they don't appear in the mdr7 index which is only for roads.
Not sure if other indexes allow those partial names. I guess no because Steve did not implement the --x-split-name-index for them.

Gerd

________________________________________
Von: mkgmap-dev <[hidden email]> im Auftrag von Carlos Dávila <[hidden email]>
Gesendet: Mittwoch, 5. April 2017 08:34:30
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] Meaning of option --x-mdr7-excl

El 04/04/17 a las 20:34, Andrzej Popowski escribió:

> Hi Gerd,
>
> I think the most useful version would be to drop last word from name.
>
> If you would like to process prefixes too, then maybe limit it to
> first word only. For name like "Chemin de Pierre Froide", you would
> remove "Chemin" but not "de". Alternatively you could implement
> prefixes with multiple words, like for example:
>
> --x-mdr7-excl="chemin,chemin de"
>

I don't find much use in removing only the first word. The probability
that someone searches "Chemin de Pierre Froide" typing "de Pierre
Froide" is very low. Using multiple words would be much more useful, but
the list may be really large.


_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

popej
In reply to this post by Carlos Dávila-2
Hi Carlos,

I think it is safe to remove prefix but the same word in the middle
could have significant meaning. Consider removing "de" from name like
"Rue Saint-Jean-Baptiste de la Salle" or "Rue Joseph de Maistre".

--
Best regards,
Andrzej
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: Meaning of option --x-mdr7-excl

Carlos Dávila-2
El 05/04/17 a las 10:05, Andrzej Popowski escribió:
> Hi Carlos,
>
> I think it is safe to remove prefix but the same word in the middle
> could have significant meaning. Consider removing "de" from name like
> "Rue Saint-Jean-Baptiste de la Salle" or "Rue Joseph de Maistre".
>
I agree.
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev