batch query result to lot of 504 error

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

batch query result to lot of 504 error

yann Guillerm
Hi,

i need to make a lot of query, so i create my own overpass server.
the server answer well when testing but when i launch a batch that spam
request ( more than 1000 ).

After 50-100 requests, the server return 504 responses.

what can i do to send a lot of requests without having 504 responses ?

- processor is a 25% ( one full core)
- memory stay flat at 24/30%



it's a batch so it can wait .... nobody is waiting for the result ... but i
cannot suffer 504 responses.

please help me.

Yann.
Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

Igor Brejc
Hi Yann,

One problem could be that the timeout setting on your web server is set to too short of a period. I've made some notes on this when running Overpass on Apache: https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#504_Gateway_Timeout_Error

Cheers,
Igor


On Mon, Apr 9, 2018 at 10:49 AM, <[hidden email]> wrote:
Hi,

i need to make a lot of query, so i create my own overpass server.
the server answer well when testing but when i launch a batch that spam
request ( more than 1000 ).

After 50-100 requests, the server return 504 responses.

what can i do to send a lot of requests without having 504 responses ?

- processor is a 25% ( one full core)
- memory stay flat at 24/30%



it's a batch so it can wait .... nobody is waiting for the result ... but i
cannot suffer 504 responses.

please help me.

Yann.

Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

yann Guillerm
Thanks for the answer. 

I have already change the Apache settings du 3600 but that does not help.

I think that the number of concurrent call is the problem. 
In may script if i limit the concurrent call to 20, my test programme work.
If i try the same script with a limit to 40, a got 504 errors.

i was thinking to start 4 overpass installation on my server ( one for each core)  and load balance between them ? 
do you that will do the trick ? 

Yann.


2018-04-09 11:11 GMT+02:00 Igor Brejc <[hidden email]>:
Hi Yann,

One problem could be that the timeout setting on your web server is set to too short of a period. I've made some notes on this when running Overpass on Apache: https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#504_Gateway_Timeout_Error

Cheers,
Igor


On Mon, Apr 9, 2018 at 10:49 AM, <[hidden email]> wrote:
Hi,

i need to make a lot of query, so i create my own overpass server.
the server answer well when testing but when i launch a batch that spam
request ( more than 1000 ).

After 50-100 requests, the server return 504 responses.

what can i do to send a lot of requests without having 504 responses ?

- processor is a 25% ( one full core)
- memory stay flat at 24/30%



it's a batch so it can wait .... nobody is waiting for the result ... but i
cannot suffer 504 responses.

please help me.

Yann.


Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

Igor Brejc
I don't know, it depends on where the bottleneck is. Could be the disk, the RAM or CPU. Mostly it depends on what kind of queries you are running. I'd suggest running a resource monitor to see what's going on.

On Mon, Apr 9, 2018 at 12:43 PM, yann Guillerm <[hidden email]> wrote:
Thanks for the answer. 

I have already change the Apache settings du 3600 but that does not help.

I think that the number of concurrent call is the problem. 
In may script if i limit the concurrent call to 20, my test programme work.
If i try the same script with a limit to 40, a got 504 errors.

i was thinking to start 4 overpass installation on my server ( one for each core)  and load balance between them ? 
do you that will do the trick ? 

Yann.


2018-04-09 11:11 GMT+02:00 Igor Brejc <[hidden email]>:
Hi Yann,

One problem could be that the timeout setting on your web server is set to too short of a period. I've made some notes on this when running Overpass on Apache: https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#504_Gateway_Timeout_Error

Cheers,
Igor


On Mon, Apr 9, 2018 at 10:49 AM, <[hidden email]> wrote:
Hi,

i need to make a lot of query, so i create my own overpass server.
the server answer well when testing but when i launch a batch that spam
request ( more than 1000 ).

After 50-100 requests, the server return 504 responses.

what can i do to send a lot of requests without having 504 responses ?

- processor is a 25% ( one full core)
- memory stay flat at 24/30%



it's a batch so it can wait .... nobody is waiting for the result ... but i
cannot suffer 504 responses.

please help me.

Yann.



Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

yann Guillerm


@Igor : I think that you are looking to my screen ! ... I just start resource monitor and see a bunch (20 or more)  of interpreter process waiting for CPU ... 

my query use is_in() to find Area, then find specific nodes in this area. 

so now : 
  - i think i don't need to create more than 1 server instance on my physical server.
  - i need to have more CPU   or  send request at a slower rate.

Yann.



2018-04-09 13:08 GMT+02:00 Igor Brejc <[hidden email]>:
I don't know, it depends on where the bottleneck is. Could be the disk, the RAM or CPU. Mostly it depends on what kind of queries you are running. I'd suggest running a resource monitor to see what's going on.

On Mon, Apr 9, 2018 at 12:43 PM, yann Guillerm <[hidden email]> wrote:
Thanks for the answer. 

I have already change the Apache settings du 3600 but that does not help.

I think that the number of concurrent call is the problem. 
In may script if i limit the concurrent call to 20, my test programme work.
If i try the same script with a limit to 40, a got 504 errors.

i was thinking to start 4 overpass installation on my server ( one for each core)  and load balance between them ? 
do you that will do the trick ? 

Yann.


2018-04-09 11:11 GMT+02:00 Igor Brejc <[hidden email]>:
Hi Yann,

One problem could be that the timeout setting on your web server is set to too short of a period. I've made some notes on this when running Overpass on Apache: https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#504_Gateway_Timeout_Error

Cheers,
Igor


On Mon, Apr 9, 2018 at 10:49 AM, <[hidden email]> wrote:
Hi,

i need to make a lot of query, so i create my own overpass server.
the server answer well when testing but when i launch a batch that spam
request ( more than 1000 ).

After 50-100 requests, the server return 504 responses.

what can i do to send a lot of requests without having 504 responses ?

- processor is a 25% ( one full core)
- memory stay flat at 24/30%



it's a batch so it can wait .... nobody is waiting for the result ... but i
cannot suffer 504 responses.

please help me.

Yann.




Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

Roland Olbricht
In reply to this post by yann Guillerm
Hi Yann,

> I have already change the Apache settings du 3600 but that does not help.
>
> I think that the number of concurrent call is the problem.
> In may script if i limit the concurrent call to 20, my test programme work.
> If i try the same script with a limit to 40, a got 504 errors.
>
>         - processor is a 25% ( one full core)
>         - memory stay flat at 24/30%

thank you for asking. This helps me to understand how people use
non-public instances. The Overpass dispatcher uses two internal limits,
"space" and "time" to determine whether it has load. Please run

dispatcher --osm-base --status

An example from z.overpass-api.de:

Number of not yet opened connections: 0
Number of connected clients: 19
Rate limit: 2
Total available space: 12884901888
Total claimed space: 4831838208
Average claimed space: 7416928460
Total available time units: 262144
Total claimed time units: 2415
Average claimed time units: 3387
Counter of started requests: 69180046
Counter of finished requests: 69171436
[...] [list of running and pending processes]

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

None of the two could work if the disk is the limiting factor. The best
tool to check disk load I know is "iotop". It should suffice to
cross-check whether disk performance matters.

I expect that separating for CPU kernels will not work. The queries are
distinct processes, and virtually any operating system is good in
distributing processes on CPUs. Opposed to that I have run into two
other systematic bottlenecks:
- super slow motion of swap: The OS will not warn if the RAM is short
and instead start to swap if too many processes perform actual work in
parallel. Swap is horribly slow in such a setting.
- disk dead lock: Similarly, the OS lets the disk jump so often between
many concurrent processes that most time is spent seeking. I have not
systematically ivestigated whether the problem persists with a SSD.

Given that in addition on the public instance most requests have a very
short runtime (90% of all successful queries take less than a second), I
have designed the system to delay further requests if much more than the
number of CPU kernels is already running, represented by the above
mentioned limits. You can identify this phenomenon quite clearly if
there are many requests with HTTP 504 and 15 seconds runtime in the log
files. The number of parallel requests is essentially
(space - 1 GB)/512 MB, i.e. for 12 GiB, the default value, it is 22
concurrent requests.

Best regards,
Roland
Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

Igor Brejc
Hi Roland,

On Mon, Apr 9, 2018 at 5:04 PM, Roland Olbricht <[hidden email]> wrote:
The Overpass dispatcher uses two internal limits, "space" and "time" to determine whether it has load. Please run

dispatcher --osm-base --status

Thank you for the useful tip!
I will post a sample of queries I'm running on my private instance and what I discovered when trying to speed it up.
 
I expect that separating for CPU kernels will not work. The queries are distinct processes, and virtually any operating system is good in distributing processes on CPUs. Opposed to that I have run into two other systematic bottlenecks:
- super slow motion of swap: The OS will not warn if the RAM is short and instead start to swap if too many processes perform actual work in parallel. Swap is horribly slow in such a setting.
- disk dead lock: Similarly, the OS lets the disk jump so often between many concurrent processes that most time is spent seeking. I have not systematically ivestigated whether the problem persists with a SSD.

In my case, the bottleneck is mostly CPU. top shows the interpreter process running at 100% of CPU most of the time, while the disk is mostly not doing anything or not doing much (I have SSD, by the way). RAM is also not a problem, at least not until I reach a certain size of the query area.

Regarding the disk: is it possible to separate Overpass DB tables to different disks (nodes on one disk and the rest on the other, for example)? Would this speed up things in cases when the disk is the bottleneck?

Igor
Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

yann Guillerm
In reply to this post by Roland Olbricht
Hi Roland,

I manage to make a lot more request now but i have to slow down my process to much.... i will try again to reach the 22 concurent request you tell at the end of you answer.

the average request take 11 seconds to be resolved ( from 1 to 30 seconds, but mostly around 10/11 seconds)

cpu seems to be the bootleneck of my server .

here is the result of dispatcher --osm-base --status : 

$EXEC_DIR/bin/dispatcher --osm-base --status
Number of not yet opened connections: 0
Number of connected clients: 2
Rate limit: 0
Total available space: 12884901888
Total claimed space: 4294967296
Average claimed space: 7247757312
Total available time units: 262144
Total claimed time units: 0
Average claimed time units: 19800
Counter of started requests: 87276
Counter of finished requests: 87198
202 8399 0 4294967296 0 1523348264



i'm not sure to understand all those numbers ;-)
i run the command when my script was not running.

i will launch a test and use iotop to check disk usage.

thanks for your tips.


2018-04-09 17:04 GMT+02:00 Roland Olbricht <[hidden email]>:
Hi Yann,

I have already change the Apache settings du 3600 but that does not help.

I think that the number of concurrent call is the problem.
In may script if i limit the concurrent call to 20, my test programme work.
If i try the same script with a limit to 40, a got 504 errors.

        - processor is a 25% ( one full core)
        - memory stay flat at 24/30%

thank you for asking. This helps me to understand how people use non-public instances. The Overpass dispatcher uses two internal limits, "space" and "time" to determine whether it has load. Please run

dispatcher --osm-base --status

An example from z.overpass-api.de:

Number of not yet opened connections: 0
Number of connected clients: 19
Rate limit: 2
Total available space: 12884901888
Total claimed space: 4831838208
Average claimed space: 7416928460
Total available time units: 262144
Total claimed time units: 2415
Average claimed time units: 3387
Counter of started requests: 69180046
Counter of finished requests: 69171436
[...] [list of running and pending processes]

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

None of the two could work if the disk is the limiting factor. The best tool to check disk load I know is "iotop". It should suffice to cross-check whether disk performance matters.

I expect that separating for CPU kernels will not work. The queries are distinct processes, and virtually any operating system is good in distributing processes on CPUs. Opposed to that I have run into two other systematic bottlenecks:
- super slow motion of swap: The OS will not warn if the RAM is short and instead start to swap if too many processes perform actual work in parallel. Swap is horribly slow in such a setting.
- disk dead lock: Similarly, the OS lets the disk jump so often between many concurrent processes that most time is spent seeking. I have not systematically ivestigated whether the problem persists with a SSD.

Given that in addition on the public instance most requests have a very short runtime (90% of all successful queries take less than a second), I have designed the system to delay further requests if much more than the number of CPU kernels is already running, represented by the above mentioned limits. You can identify this phenomenon quite clearly if there are many requests with HTTP 504 and 15 seconds runtime in the log files. The number of parallel requests is essentially
(space - 1 GB)/512 MB, i.e. for 12 GiB, the default value, it is 22 concurrent requests.

Best regards,
Roland

Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

yann Guillerm
So, 

i use iotop to check for disk usage ...
 Nothing / Nada / Que dale  : disk usage is flat with some max usage near 6% ... 
in the same time CPU is at 100% and RAM about 30%.

i have to find a better CPU server.

best regards, 


Yann.

2018-04-10 12:56 GMT+02:00 yann Guillerm <[hidden email]>:
Hi Roland,

I manage to make a lot more request now but i have to slow down my process to much.... i will try again to reach the 22 concurent request you tell at the end of you answer.

the average request take 11 seconds to be resolved ( from 1 to 30 seconds, but mostly around 10/11 seconds)

cpu seems to be the bootleneck of my server .

here is the result of dispatcher --osm-base --status : 

$EXEC_DIR/bin/dispatcher --osm-base --status
Number of not yet opened connections: 0
Number of connected clients: 2
Rate limit: 0
Total available space: 12884901888
Total claimed space: 4294967296
Average claimed space: 7247757312
Total available time units: 262144
Total claimed time units: 0
Average claimed time units: 19800
Counter of started requests: 87276
Counter of finished requests: 87198
202 8399 0 4294967296 0 1523348264



i'm not sure to understand all those numbers ;-)
i run the command when my script was not running.

i will launch a test and use iotop to check disk usage.

thanks for your tips.


2018-04-09 17:04 GMT+02:00 Roland Olbricht <[hidden email]>:
Hi Yann,

I have already change the Apache settings du 3600 but that does not help.

I think that the number of concurrent call is the problem.
In may script if i limit the concurrent call to 20, my test programme work.
If i try the same script with a limit to 40, a got 504 errors.

        - processor is a 25% ( one full core)
        - memory stay flat at 24/30%

thank you for asking. This helps me to understand how people use non-public instances. The Overpass dispatcher uses two internal limits, "space" and "time" to determine whether it has load. Please run

dispatcher --osm-base --status

An example from z.overpass-api.de:

Number of not yet opened connections: 0
Number of connected clients: 19
Rate limit: 2
Total available space: 12884901888
Total claimed space: 4831838208
Average claimed space: 7416928460
Total available time units: 262144
Total claimed time units: 2415
Average claimed time units: 3387
Counter of started requests: 69180046
Counter of finished requests: 69171436
[...] [list of running and pending processes]

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

If the used space is close to the maximum space then raise the space:
dispatcher --osm-base --space=20000000000

None of the two could work if the disk is the limiting factor. The best tool to check disk load I know is "iotop". It should suffice to cross-check whether disk performance matters.

I expect that separating for CPU kernels will not work. The queries are distinct processes, and virtually any operating system is good in distributing processes on CPUs. Opposed to that I have run into two other systematic bottlenecks:
- super slow motion of swap: The OS will not warn if the RAM is short and instead start to swap if too many processes perform actual work in parallel. Swap is horribly slow in such a setting.
- disk dead lock: Similarly, the OS lets the disk jump so often between many concurrent processes that most time is spent seeking. I have not systematically ivestigated whether the problem persists with a SSD.

Given that in addition on the public instance most requests have a very short runtime (90% of all successful queries take less than a second), I have designed the system to delay further requests if much more than the number of CPU kernels is already running, represented by the above mentioned limits. You can identify this phenomenon quite clearly if there are many requests with HTTP 504 and 15 seconds runtime in the log files. The number of parallel requests is essentially
(space - 1 GB)/512 MB, i.e. for 12 GiB, the default value, it is 22 concurrent requests.

Best regards,
Roland


Reply | Threaded
Open this post in threaded view
|

Re: batch query result to lot of 504 error

marc marc
Hello,

I often find it counterproductive to launch too many simultaneous
processes. having too many simultaneous processes by core makes the cpu
spend more of its time switching between processes. With intensive
tasks, the ideal is often one to 2 simultaneous processes per cpu core.

Regards,
Marc

Le 10. 04. 18 à 13:38, yann Guillerm a écrit :

> So,
>
> i use iotop to check for disk usage ...
>   Nothing / Nada / Que dale  : disk usage is flat with some max usage
> near 6% ...
> in the same time CPU is at 100% and RAM about 30%.
>
> i have to find a better CPU server.
>
> best regards,
>
>
> Yann.
>
> 2018-04-10 12:56 GMT+02:00 yann Guillerm <[hidden email]
> <mailto:[hidden email]>>:
>
>     Hi Roland,
>
>     I manage to make a lot more request now but i have to slow down my
>     process to much.... i will try again to reach the 22 concurent
>     request you tell at the end of you answer.
>
>     the average request take 11 seconds to be resolved ( from 1 to 30
>     seconds, but mostly around 10/11 seconds)
>
>     cpu seems to be the bootleneck of my server .
>
>     here is the result of dispatcher --osm-base --status :
>
>     $EXEC_DIR/bin/dispatcher --osm-base --status
>     Number of not yet opened connections: 0
>     Number of connected clients: 2
>     Rate limit: 0
>     Total available space: 12884901888
>     Total claimed space: 4294967296
>     Average claimed space: 7247757312
>     Total available time units: 262144
>     Total claimed time units: 0
>     Average claimed time units: 19800
>     Counter of started requests: 87276
>     Counter of finished requests: 87198
>     202 8399 0 4294967296 0 1523348264
>
>
>
>     i'm not sure to understand all those numbers ;-)
>     i run the command when my script was not running.
>
>     i will launch a test and use iotop to check disk usage.
>
>     thanks for your tips.
>
>
>     2018-04-09 17:04 GMT+02:00 Roland Olbricht <[hidden email]
>     <mailto:[hidden email]>>:
>
>         Hi Yann,
>
>             I have already change the Apache settings du 3600 but that
>             does not help.
>
>             I think that the number of concurrent call is the problem.
>             In may script if i limit the concurrent call to 20, my test
>             programme work.
>             If i try the same script with a limit to 40, a got 504 errors.
>
>                      - processor is a 25% ( one full core)
>                      - memory stay flat at 24/30%
>
>
>         thank you for asking. This helps me to understand how people use
>         non-public instances. The Overpass dispatcher uses two internal
>         limits, "space" and "time" to determine whether it has load.
>         Please run
>
>         dispatcher --osm-base --status
>
>         An example from z.overpass-api.de <http://z.overpass-api.de>:
>
>         Number of not yet opened connections: 0
>         Number of connected clients: 19
>         Rate limit: 2
>         Total available space: 12884901888
>         Total claimed space: 4831838208
>         Average claimed space: 7416928460
>         Total available time units: 262144
>         Total claimed time units: 2415
>         Average claimed time units: 3387
>         Counter of started requests: 69180046
>         Counter of finished requests: 69171436
>         [...] [list of running and pending processes]
>
>         If the used space is close to the maximum space then raise the
>         space:
>         dispatcher --osm-base --space=20000000000
>
>         If the used space is close to the maximum space then raise the
>         space:
>         dispatcher --osm-base --space=20000000000
>
>         None of the two could work if the disk is the limiting factor.
>         The best tool to check disk load I know is "iotop". It should
>         suffice to cross-check whether disk performance matters.
>
>         I expect that separating for CPU kernels will not work. The
>         queries are distinct processes, and virtually any operating
>         system is good in distributing processes on CPUs. Opposed to
>         that I have run into two other systematic bottlenecks:
>         - super slow motion of swap: The OS will not warn if the RAM is
>         short and instead start to swap if too many processes perform
>         actual work in parallel. Swap is horribly slow in such a setting.
>         - disk dead lock: Similarly, the OS lets the disk jump so often
>         between many concurrent processes that most time is spent
>         seeking. I have not systematically ivestigated whether the
>         problem persists with a SSD.
>
>         Given that in addition on the public instance most requests have
>         a very short runtime (90% of all successful queries take less
>         than a second), I have designed the system to delay further
>         requests if much more than the number of CPU kernels is already
>         running, represented by the above mentioned limits. You can
>         identify this phenomenon quite clearly if there are many
>         requests with HTTP 504 and 15 seconds runtime in the log files.
>         The number of parallel requests is essentially
>         (space - 1 GB)/512 MB, i.e. for 12 GiB, the default value, it is
>         22 concurrent requests.
>
>         Best regards,
>         Roland
>
>
>