Date ranges with Overpass API

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Date ranges with Overpass API

Roland Olbricht
Dear all, dear Andy, dear Jochen,

I've completed a public beta for (part of) the next Overpass API
release. It features a huge chunk of new query language elements to
allow "less" for dates. This should satisfy the request Andy has issued
during the SotM-US 2015 for Historic OSM. Jochen has brought up the idea
of a unified query language, and I would like to get as universal as
possible.

The reason for this public beta is that I would like to have some
feedback on the syntax to make it as intuitive as possible. I'm also
interested in all kinds of bugs found. Things are not necessarily fast,
but that can be fixed later on, I'm now only after syntax changes and
functional bugs.

To test against this beta, please use the API endpoint
https://dev.overpass-api.de/api_deriveds/

You can do so in Overpass Turbo by setting "Settings > General > Server"
to "//dev.overpass-api.de/api_deriveds/".

A sample query is to search around Birmingham for things created in the
19th century:
http://overpass-turbo.eu/s/ks1

way[start_date](if:[start_date]>1800 && [start_date]<1900)({{bbox}});
out center;

The new part is (if:[start_date]>1800 && [start_date]<1900).
This is combined of
- the framing (if:...)
- the logical operator "&&"
- the comparison operators "<" and ">"
- the tag evaluator [...], applied on the tag with key "start_date"

Available logical operators are "||", "&&", and "!".
Available comparison operators are "==", "<", "<=", ">", and ">=".

The comparison is numerical if both values are numerical and as strings
otherwise.

One thing I'm not sure for example is what should be recognized as
number and what is outside scope. I wanted to search for peaks outside
Rome with elevation between 500 and 1000 meters. To cross-check whether
all values are numbers I have used this:
http://overpass-turbo.eu/s/ks0
This is a function that returns 0 if the value cannot be parsed as
number and 1 if it is a number.

It turns out that entries with comma and entries with explicit
measurement unit exist. Shall we treat them as first-class numbers or not?

Available functions with one argument are:
- "number" makes a number of its argument
- "is_num" checks whether its argument can be parsed as number
- "date" makes a double representing a date out of its argument
- "is_date" checks whether something can be parsed as date

I have already mentioned "[...]" as operator that takes a key and
returns the value of the tag if it exists or the empty string otherwise.
There is a corresponding function "is_tag(...)" that returns 1 if there
exists a tag of the key or 0 otherwise.

Finally, you can also count tags and members. Find all ways with more
than 1000 members around Birmingham:
http://overpass-turbo.eu/s/ks2

And find named objects that do not have any other tag:
http://overpass-turbo.eu/s/ks3

There is more to discover. But I would like to get feedback to adjust
the syntax where appropriate. For example, if you have suggestions for
extra operators (do we need "!=" or bit shifts?), I will try to
incorporate them. The only thing I will stick to is the C-like syntax -
I'm heading more towards JavaScript to relieve the programmer's mind
from learning yet another query language.

I will now wait about two weeks for feedback, write a follow-up with
further features in this branch, and in the meantime return to the
minor_issues branch to fix open bugs.

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

Overpass API Development mailing list
Hi Roland and MMD, Great work both for scalability and new features. It might take me a little bit then 2 weeks to react. I am in a mission in Haiti.

But first look, this looks great.
 
 
Pierre



De : Roland Olbricht <[hidden email]>
À : [hidden email]
Cc : Andy Townsend <[hidden email]>; Jochen Topf <[hidden email]>
Envoyé le : samedi 3 décembre 2016 16h53
Objet : [overpass] Date ranges with Overpass API

Dear all, dear Andy, dear Jochen,

I've completed a public beta for (part of) the next Overpass API
release. It features a huge chunk of new query language elements to
allow "less" for dates. This should satisfy the request Andy has issued
during the SotM-US 2015 for Historic OSM. Jochen has brought up the idea
of a unified query language, and I would like to get as universal as
possible.

The reason for this public beta is that I would like to have some
feedback on the syntax to make it as intuitive as possible. I'm also
interested in all kinds of bugs found. Things are not necessarily fast,
but that can be fixed later on, I'm now only after syntax changes and
functional bugs.

To test against this beta, please use the API endpoint
https://dev.overpass-api.de/api_deriveds/

You can do so in Overpass Turbo by setting "Settings > General > Server"
to "//dev.overpass-api.de/api_deriveds/".

A sample query is to search around Birmingham for things created in the
19th century:
http://overpass-turbo.eu/s/ks1

way[start_date](if:[start_date]>1800 && [start_date]<1900)({{bbox}});
out center;

The new part is (if:[start_date]>1800 && [start_date]<1900).
This is combined of
- the framing (if:...)
- the logical operator "&&"
- the comparison operators "<" and ">"
- the tag evaluator [...], applied on the tag with key "start_date"

Available logical operators are "||", "&&", and "!".
Available comparison operators are "==", "<", "<=", ">", and ">=".

The comparison is numerical if both values are numerical and as strings
otherwise.

One thing I'm not sure for example is what should be recognized as
number and what is outside scope. I wanted to search for peaks outside
Rome with elevation between 500 and 1000 meters. To cross-check whether
all values are numbers I have used this:
http://overpass-turbo.eu/s/ks0
This is a function that returns 0 if the value cannot be parsed as
number and 1 if it is a number.

It turns out that entries with comma and entries with explicit
measurement unit exist. Shall we treat them as first-class numbers or not?

Available functions with one argument are:
- "number" makes a number of its argument
- "is_num" checks whether its argument can be parsed as number
- "date" makes a double representing a date out of its argument
- "is_date" checks whether something can be parsed as date

I have already mentioned "[...]" as operator that takes a key and
returns the value of the tag if it exists or the empty string otherwise.
There is a corresponding function "is_tag(...)" that returns 1 if there
exists a tag of the key or 0 otherwise.

Finally, you can also count tags and members. Find all ways with more
than 1000 members around Birmingham:
http://overpass-turbo.eu/s/ks2

And find named objects that do not have any other tag:
http://overpass-turbo.eu/s/ks3

There is more to discover. But I would like to get feedback to adjust
the syntax where appropriate. For example, if you have suggestions for
extra operators (do we need "!=" or bit shifts?), I will try to
incorporate them. The only thing I will stick to is the C-like syntax -
I'm heading more towards JavaScript to relieve the programmer's mind
from learning yet another query language.

I will now wait about two weeks for feedback, write a follow-up with
further features in this branch, and in the meantime return to the
minor_issues branch to fix open bugs.

Best regards,

Roland


Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

Roland Olbricht
Hi Pierre,

we are in no hurry. I'll wait until Mid-January for feedback.

 > I am in a mission in Haiti.

Good luck.

Best regards,

Roland

mmd
Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

mmd
In reply to this post by Roland Olbricht
Hi Roland,

Am 03.12.2016 um 22:53 schrieb Roland Olbricht:

first impression, looks great!

>
> The new part is (if:[start_date]>1800 && [start_date]<1900).
> This is combined of
> - the framing (if:...)
> - the logical operator "&&"
> - the comparison operators "<" and ">"
> - the tag evaluator [...], applied on the tag with key "start_date"
>
> Available logical operators are "||", "&&", and "!".

How do you define precedence rules for those operators, in which
sequence are they evaluated? It is also possible to specify brackets,
like a && (b || c) ?

> Available comparison operators are "==", "<", "<=", ">", and ">=".
>
> The comparison is numerical if both values are numerical and as strings
> otherwise.
>
> One thing I'm not sure for example is what should be recognized as
> number and what is outside scope. I wanted to search for peaks outside
> Rome with elevation between 500 and 1000 meters. To cross-check whether
> all values are numbers I have used this:
> http://overpass-turbo.eu/s/ks0
> This is a function that returns 0 if the value cannot be parsed as
> number and 1 if it is a number.

>
> It turns out that entries with comma and entries with explicit
> measurement unit exist. Shall we treat them as first-class numbers or not?

I think it would sense to at least consider the comma. Regarding unit of
measures, someone posted some ideas here:

https://github.com/drolbr/Overpass-API/issues/78#issuecomment-71068992

Not sure if this is feasible or even useful, just wanted to mention it.

>
> Available functions with one argument are:
> - "number" makes a number of its argument
> - "is_num" checks whether its argument can be parsed as number
> - "date" makes a double representing a date out of its argument
> - "is_date" checks whether something can be parsed as date

I noticed that those functions internally return "1" and "0", even for
boolean values. I guess there's a good reason for doing so (didn't look
into the details yet).

>
> I have already mentioned "[...]" as operator that takes a key and
> returns the value of the tag if it exists or the empty string otherwise.

It would be extremely cool to have [tag] also on the right hand side,
and the value extracted from another inputset with just one element.
This would cover use cases like deviating zip codes combined with a
foreach clause, see:

http://wiki.openstreetmap.org/wiki/DE:Overpass_API/Beispielsammlung#Abweichende_addr:postcode.3DXXXXX_Tags_innerhalb_einer_Grenzrelation_mit_postalcode.3C.3EXXXXX

Filtering objects depending on some other object is frequently asked for.

>
> Finally, you can also count tags and members. Find all ways with more
> than 1000 members around Birmingham:
> http://overpass-turbo.eu/s/ks2
>
> And find named objects that do not have any other tag:
> http://overpass-turbo.eu/s/ks3
>

We already have an issue with lots of use cases around counting:

https://github.com/drolbr/Overpass-API/issues/197

Could you please take a look at those details and see what could also be
incorporated, or is already covered by your implementation? E.g. things
like:
- distinct nodes, or
- how often a node is contained in a way/relation


> There is more to discover. But I would like to get feedback to adjust
> the syntax where appropriate. For example, if you have suggestions for
> extra operators (do we need "!=" or bit shifts?),

Right now, I don't see an obvious use case for bit shifts, but != might
come in handy for sure.

> I will try to
> incorporate them.

Great!

> The only thing I will stick to is the C-like syntax -
> I'm heading more towards JavaScript to relieve the programmer's mind
> from learning yet another query language.

This I didn't quite get. :)


best,
mmd


Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

Roland Olbricht
Hi mmd,

>> Available logical operators are "||", "&&", and "!".
>
> How do you define precedence rules for those operators, in which
> sequence are they evaluated? It is also possible to specify brackets,
> like a && (b || c) ?

Yes, brackets are possible. The operator precedence is the same as in C,
C++, Java, and JavaScript:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence
although not all operators are implemented.

> I think it would sense to at least consider the comma. Regarding unit of
> measures, someone posted some ideas here:
>
> https://github.com/drolbr/Overpass-API/issues/78#issuecomment-71068992
>
> Not sure if this is feasible or even useful, just wanted to mention it.

Thank you for the reminder. Honestly, I don't understand what the writer
is after. In particular, unit conversion is cheap, and it makes sense to
have custom units in some case (e.g. verbatim copies of traffic signs in
"mph" and "ft"). I'm tending towards a set of functions

- si_length, is_length
- si_speed, is_speed

or so that deliver a number in SI units if the input unit is known. If
no unit is given but the value is numerical then it is understood as
number in SI units. A function "kph_speed" might do better, because the
default unit for speed in OSM is "km/h" instead of "m/s".

This allows people to use less and greater regardless of units in the
intuitive sense.

> I noticed that those functions internally return "1" and "0", even for
> boolean values. I guess there's a good reason for doing so (didn't look
> into the details yet).

After all, I don't want to introduce a type system at this point. There
is more than enough new stuff for the users to digest. The rule is that
the logical operators understand the empty string or a string that can
be parsed as number zero as false and everything else as true.

> It would be extremely cool to have [tag] also on the right hand side,
> and the value extracted from another inputset with just one element.

Please have a look at
http://overpass-turbo.eu/s/ktW

You can leave out the "._" (or replace it by a different set). This is
the placeholder for the set to take, and "_" is the default choice.

> We already have an issue with lots of use cases around counting:
>
> https://github.com/drolbr/Overpass-API/issues/197

BTW: I'm in total considering these eleven issues for the design of this
feature:

https://github.com/drolbr/Overpass-API/issues/49
add synthetic tags to parts of the result

https://github.com/drolbr/Overpass-API/issues/78
Numerical comparison

https://github.com/drolbr/Overpass-API/issues/81
Additional sort orders

https://github.com/drolbr/Overpass-API/issues/136
Refactoring of output architecture

https://github.com/drolbr/Overpass-API/issues/171
simplification ?

https://github.com/drolbr/Overpass-API/issues/180
number of tags of a object

https://github.com/drolbr/Overpass-API/issues/197
New filters based on counting functions

https://github.com/drolbr/Overpass-API/issues/206
Overpass result_set in overpass answer

https://github.com/drolbr/Overpass-API/issues/219
Query by object version number

https://github.com/drolbr/Overpass-API/issues/221
Add tags filter

https://github.com/drolbr/Overpass-API/issues/236
Annotate per output step

https://github.com/drolbr/Overpass-API/issues/237
Idea: add function to sum up spatial length of ways

Not all of them will get completely addressed. But solutions to them
should be possible with the enhanced design, or at least the enhanced
design should not impede their implementation.

Cheers,

Roland

mmd
Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

mmd
Hi Roland,


thank for you the detailed feedback. I started testing some of the
features you described and found a few minor issues so far. Do you plan
to release a preliminary version of the documentation soon? It think
that would be really helpful for a more comprehensive testing.

Here's a short summary of my tests so far:


Test 1: Ways with single node only
----------------------------------

way(if:count(members)==1)({{bbox}});
out center;


-> ok


Test 2: Relations without tags / member
---------------------------------------

Test 2a: (global), without bbox

rel(if:count(tags)==0);
out;

rel(if:count(members)==0);
out;


-> result not conclusive: no result, no error message returned


Test 2b: adding bbox

rel({{bbox}})(if:count(tags)==0);
out;

-> result ok



Test 3: omitting [..] for tags, multiple (if: ...) in one query
---------------------------------------------------------------


way[maxheight][maxwidth](if:maxheight > 2)(if:maxwidth > 2);
out geom;

-> no error message/parsing error returned

Similar: is_num(layer) -> no error message!


Test 4: Comparing tags within same query
----------------------------------------

[bbox:{{bbox}}];

way[maxheight][maxwidth](if:is_num([maxheight]) && is_num([maxwidth]) &&
[maxheight] > [maxwidth]);
out geom;


-> test ok


Test 5: Nodes with at least one tag
-----------------------------------

[bbox:{{bbox}}];

node(if:count(tags)>0);
out geom;

-> test ok



Test 6: Count way's nodes
-------------------------

[bbox:{{bbox}}];

way(if:count(nodes)>1);
out geom;


-> no result, does not seem to work



Test 7: Invalid layer tag
-------------------------

[bbox:{{bbox}}];


node["layer"](if:!is_num([layer]) || ([layer] < -5 || [layer] > 5));
out;

-> test ok
(some layer tags have multiple values "1;0" -> not numeric)


Test 8: unknown function
------------------------

[bbox:{{bbox}}];

node["addr:housenumber"]
  (if:is_odd(["addr:housenumber"]));
                         out;

-> ok. Error message shown: parse error: Function "is_odd" not known


cheers
mmd



Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

Jochen123
In reply to this post by Roland Olbricht
Hi!

On Sat, Dec 03, 2016 at 10:53:38PM +0100, Roland Olbricht wrote:
> way[start_date](if:[start_date]>1800 && [start_date]<1900)({{bbox}});
> out center;
>
> The new part is (if:[start_date]>1800 && [start_date]<1900).
> This is combined of
> - the framing (if:...)
> - the logical operator "&&"
> - the comparison operators "<" and ">"
> - the tag evaluator [...], applied on the tag with key "start_date"

The overpass language has always been difficult to understand. Adding
this (if:...) stuff makes it even more complex. If I understand this
correctly, "way[start_date]" matches all ways with tag start_date. So
here "[]" is a kind of "condition" operator. The ({{bbox}}) means it has
to be inside that bbox, so the condition operator here is the "()". Now
there is a new condition operator "(if:..)" and inside it "[]" is not
used for conditions but to "evaluate" the tag. There must be a better
way!

> It turns out that entries with comma and entries with explicit measurement
> unit exist. Shall we treat them as first-class numbers or not?

Handling units is difficult and I don't really have a good solution for
that. Maybe you need functions to extract the value and unit: So for tag
"width=10m", the functions "number([width])" and "unit([width])" would
return 10 and "m" respectively. But to use this you would have to write
things like "if (number([width]) > 10 and unit([width]) == "m") or
(number([width]) > 30 and unit([width]) == "ft")" to compare feet and
meter. Maybe something like number_with_unit([width], "meter") which
would give you 10 for "10m" and 3 for "10ft", so always converting into
the second parameter (here: meter). But once you go down this road,
you'll need floating point numbers probably and so on. At some point the
question is whether a query language can handle all this or if the user
has to do post-processing themselves.

> Available functions with one argument are:
> - "number" makes a number of its argument
> - "is_num" checks whether its argument can be parsed as number

Why is that "number", but "is_num"? "is_number" would be more
consistent.

> - "date" makes a double representing a date out of its argument
> - "is_date" checks whether something can be parsed as date
>
> I have already mentioned "[...]" as operator that takes a key and returns
> the value of the tag if it exists or the empty string otherwise.
> There is a corresponding function "is_tag(...)" that returns 1 if there
> exists a tag of the key or 0 otherwise.
>
> Finally, you can also count tags and members. Find all ways with more than
> 1000 members around Birmingham:
> http://overpass-turbo.eu/s/ks2
>
> And find named objects that do not have any other tag:
> http://overpass-turbo.eu/s/ks3
>
> There is more to discover. But I would like to get feedback to adjust the
> syntax where appropriate. For example, if you have suggestions for extra
> operators (do we need "!=" or bit shifts?), I will try to incorporate them.
> The only thing I will stick to is the C-like syntax - I'm heading more
> towards JavaScript to relieve the programmer's mind from learning yet
> another query language.

I don't see the need for bit shifts, but "!=" is definitely useful and
people will expect it to be there.

Jochen
--
Jochen Topf  [hidden email]  https://www.jochentopf.com/  +49-351-31778688
mmd
Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

mmd
In reply to this post by Roland Olbricht
Hi Roland,

filtering by date (ranges) is probably a more involved topic, as the
recent discussion on Irish Townlands show.

A simple date conversion and comparison of a `start_date` tag won't
really fit this tagging schema. And there's use cases such as "find all
objects which had a particular name on a given date".

http://www.openstreetmap.org/user/rorym/diary/40092

https://wiki.openstreetmap.org/wiki/Talk:Proposed_features/Date_namespace

What's your thoughts on this?

best,
mmd

--








Reply | Threaded
Open this post in threaded view
|

Re: Date ranges with Overpass API

Roland Olbricht
In reply to this post by Roland Olbricht
Hello,

Although late, I would like to provide the promised follow-up.

First of all, thank you to all who have given feedback. The evaluator
_is_num_ is now renamed to _is_number_ to be in line with _number_ and
_date_ and _is_date_.

A still open point is the syntax for tag evaulation.
I agree that pure brackets might confuse users. And that syntax does not
exist in any way in JavaScript or similar languages. I'm wondering
whether an approach with a single letter, like

   t[name]
   v[name]

or so would make to tag evaulator more intuitively. I would like to
stick with the brackets, because people know that as syntax of
dictionaries. And the tags of an element constitute a dictionary.

That there is close-call syntax with a different meaning in the XAPI
legacy is admittedly a pity. But we should not throw out that syntax
just for the sake of purity, not now and not next year. Giving people a
migration path to the more versatile present and future syntax is more
important.

But now the promised further examples:


The filter for date ranges is actually a full fledged less than, and
variants for less-equal, greater-than and greater-equal exist as well:

Find fast roads:
http://overpass-turbo.eu/s/lcg

Look for roads with non-number values:
http://overpass-turbo.eu/s/lch

Or objects that have only a name tag:
http://overpass-turbo.eu/s/lci

Beside this, one can find all objects with similar (or related) properties:
http://overpass-turbo.eu/s/lcj

Enhanced version, this one also gets nodes with level:
http://overpass-turbo.eu/s/lck

Even more, we could get one level up:
http://overpass-turbo.eu/s/lcl

Beside filter, the make statement allows to create objects as desired.
The simplest case is to output where useful a marker:
http://overpass-turbo.eu/s/lcm

This also contains a generalisation of the count.
You could list all values of a tag that exist in a result in addition to
counting:
http://overpass-turbo.eu/s/lcn

The full list of aggregators, like _set_ and _count_ used here, can be
found in the specification (TODO).


This version also brings a possibility to rewrite tags lists of objects.
If you do want objects with lots of tags in the result but only need
some tags
then you can use an approach like this:
http://overpass-turbo.eu/s/lco

This has already been possible with the special output mode [out:csv].
But now you can feed this into your toolchain regardless whether you
need XML or JSON as well.

Or just combine names:
http://overpass-turbo.eu/s/lcp

In some cases, you only want to knock out one or two tags:
http://overpass-turbo.eu/s/lcq


Finally, this allows to generate is_in-tags on the fly:

   node({{bbox}})[place=village];
   foreach(
     is_in->.a;
     ( convert node ::=::,is_in=a.set([name]); .result; )->.result;
   );
   .result out;

(note: currently no areas on the dev instance, hence no link)

Although it is not fast, it generalizes to other relationships:
http://overpass-turbo.eu/s/lcr

Cheers,

Roland

Reply | Threaded
Open this post in threaded view
|

Area creation

Zecke-2
In reply to this post by mmd
Hi,

I recently set up area creation as areas were required for one of our applications. That worked fine so far, I followed the instructions at
http://overpass-api.de/full_installation.html

section "Area creation". I skipped the nice'ing thing. Area creation started, the rules_loop.log looks as follows:

2017-02-11 18:15:57: update started
2017-02-12 18:16:37: update finished
2017-02-12 18:16:40: update started
2017-02-13 18:17:19: update finished
2017-02-13 18:17:22: update started

It seems to me that area creation comes to the 24h timeout. Although after the first 24h period area queries give  meaningful results, I doubt whether everything's running fine:
Is it normal that the correlated osm3s_query process runs permanently with all available CPU? Could it be that coming to the timeout restarts the creation process each day without terminating or does it work cumulative so I can expect an end?

BTW, I found that the rules_loop.sh does not accept relative paths as parameters as the other scripts do, so I extended it by the line:

[[ ! ${DB_DIR:0:1} == "/" ]] && DB_DIR="`pwd`/$DB_DIR"

after the line
DB_DIR=$1

Best regards,
Carsten