apply_osc_to_db not handling dispatcher problems

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

apply_osc_to_db not handling dispatcher problems

Igor Brejc
Hi,

I notice that if the main dispatcher stops working for some reason (file locks etc.), the apply_osc_to_db.sh script does not really detect that, instead its log reports new diffs were applied in a rapid fashion without reporting any errors (cca. 1 second for a diff, while it usually takes around 15 seconds). 

After that, the DB gets inconsistent (missing nodes, ways etc.) and it's basically useless and needs to be recloned. 

Igor
mmd
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

mmd
Hi 

I've created a new issue on GitHub for this, as mailing lists are a pain for status tracking:


Cheers




Igor Brejc <[hidden email]> schrieb am Di. 5. Sep. 2017 um 08:03:
Hi,

I notice that if the main dispatcher stops working for some reason (file locks etc.), the apply_osc_to_db.sh script does not really detect that, instead its log reports new diffs were applied in a rapid fashion without reporting any errors (cca. 1 second for a diff, while it usually takes around 15 seconds). 

After that, the DB gets inconsistent (missing nodes, ways etc.) and it's basically useless and needs to be recloned. 

Igor
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
Thanks!

On Tue, Sep 5, 2017 at 9:18 AM, mmd <[hidden email]> wrote:
Hi 

I've created a new issue on GitHub for this, as mailing lists are a pain for status tracking:


Cheers




Igor Brejc <[hidden email]> schrieb am Di. 5. Sep. 2017 um 08:03:
Hi,

I notice that if the main dispatcher stops working for some reason (file locks etc.), the apply_osc_to_db.sh script does not really detect that, instead its log reports new diffs were applied in a rapid fashion without reporting any errors (cca. 1 second for a diff, while it usually takes around 15 seconds). 

After that, the DB gets inconsistent (missing nodes, ways etc.) and it's basically useless and needs to be recloned. 

Igor

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
In reply to this post by Igor Brejc
Hi Igor,

> I notice that if the main dispatcher stops working for some reason (file
> locks etc.), the apply_osc_to_db.sh script does not really detect that,
> instead its log reports new diffs were applied in a rapid fashion
> without reporting any errors (cca. 1 second for a diff, while it usually
> takes around 15 seconds).

Thank you for reporting the issue. Can you please give details?

If the logfile of the dispatcher ($DB_DIR/transactions.log) is of a
decent size then please send that file.

Has the script, in particular update_from_dir called from the script,
produced any output? If so, could you please send that as well?

Could you please verify which script is actually running? Your documentation
https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#Configuring_Diffs
mentions both apply_osc_to_db.sh and fetch_osc_and_apply.sh

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
In reply to this post by mmd
Hi,

> I've created a new issue on GitHub for this, as mailing lists are a pain
> for status tracking:

Well, that might differ from person to person. I will answer on the list.

Please note that the list has the advantage of keeping a public archive.
And it keeps a synchronized copy on all devices of all recipients of the
mailing list.

Cheers,
Roland
mmd
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

mmd
Am 06.09.2017 um 06:29 schrieb Roland Olbricht:
> Hi,
>
>> I've created a new issue on GitHub for this, as mailing lists are a
>> pain for status tracking:
>

>
> Please note that the list has the advantage of keeping a public archive.

The rails-dev mailing list is set up in such a way: every post on Github
is automatically replicated there, keeping that public archive. As
repository owner could easily set up exactly the same scenario.

https://lists.openstreetmap.org/pipermail/rails-dev/

Other than that you can add yourself as "Watcher" to the repository and
automatically receive any updates.

> And it keeps a synchronized copy on all devices of all recipients of the
> mailing list.
>

...and fails on the most important aspect of issue fixing: it does not
communicate any kind of status, e.g. if you mark an email in your email
client as "done", nobody else will know.

With more than a couple of issues to manage or a team size > 1, this
quickly becomes unwieldy and difficult to manage.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
In reply to this post by Roland Olbricht
Hi Roland,

Answering you below...

On Wed, Sep 6, 2017 at 6:24 AM, Roland Olbricht <[hidden email]> wrote:
Hi Igor,

I notice that if the main dispatcher stops working for some reason (file locks etc.), the apply_osc_to_db.sh script does not really detect that, instead its log reports new diffs were applied in a rapid fashion without reporting any errors (cca. 1 second for a diff, while it usually takes around 15 seconds).

Thank you for reporting the issue. Can you please give details?

Basic details: I run both the main dispatcher and the diff script (as described on my page) as systemd units. Since I don't keep the server running when I don't need Overpass, I pm-suspend it until the next use. Usually, after I bring the machine back from suspension:
  1. the main unit (dispatcher) is down for some reason (I don't really know why, but I do know I have to delete the lock files before I can restart it), 
  2. while the diff unit goes on applying next diffs and does not detect the main service being down, as I described in the initial mail.

If the logfile of the dispatcher ($DB_DIR/transactions.log) is of a decent size then please send that file. 

Unfortunately I don't have it any more, I cleaned everything and recloned the database since I needed a working instance. I have restarted the diff service and when it happens again, I'll send the files.
 

Has the script, in particular update_from_dir called from the script, produced any output? If so, could you please send that as well?

I'm running the script as a systemd unit. I can't find any output other than apply_osc_to_db.log and transaction.log files, unless it saves the output somewhere else?
 

Could you please verify which script is actually running? Your documentation
https://wiki.openstreetmap.org/wiki/User:Breki/Overpass_API_Installation#Configuring_Diffs
mentions both apply_osc_to_db.sh and fetch_osc_and_apply.sh

No, only one script (.sh) is mentioned. It runs fetch_osc_and_apply.sh and the script writes into apply_osc_to_db.log file. 

Cheers,
Igor

mmd
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

mmd
Am 06.09.2017 um 16:03 schrieb Igor Brejc:

>
>     Thank you for reporting the issue. Can you please give details?
>
>
> Basic details: I run both the main dispatcher and the diff script (as
> described on my page) as systemd units. Since I don't keep the server
> running when I don't need Overpass, I pm-suspend it until the next use.
> Usually, after I bring the machine back from suspension:
>
>  1. the main unit (dispatcher) is down for some reason (I don't really
>     know why, but I do know I have to delete the lock files before I can
>     restart it),
>  2. while the diff unit goes on applying next diffs and does not detect
>     the main service being down, as I described in the initial mail.
>

I tried to simulate this via vagrant suspend + up and found that
osm3s_query immediately returned with:

runtime error: open64: 32 Broken pipe /osm3s_v0.7.54_osm_base
Dispatcher_Client::request_read_and_idx::socket::1

... while dispatcher crashed with a SIGBUS error right after it received
a command from osm3s_query.

Not sure if this is the same in your setup.

For further analysis, I would turn off diff processing before triggering
the pm-suspend, then after resume:

- find out the dispatcher process id

  $ pidof dispatcher
  -> returns #process_id, e.g. 1234

- attach gdb to the running dispatcher process,

  $ gdb
  attach #process_id     e.g.: attach 1234
  continue

- run some query via osm3s_query and

- see where the dispatcher crashes.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
Hello,

a short remark: It should never happen as part of regular operations
that you need to delete any lock files. Leftover lock files are always
an indicator of a unexpectedly terminated process.

> I tried to simulate this via vagrant suspend + up and found that
> osm3s_query immediately returned with:
>
> runtime error: open64: 32 Broken pipe /osm3s_v0.7.54_osm_base
> Dispatcher_Client::request_read_and_idx::socket::1
>
> ... while dispatcher crashed with a SIGBUS error right after it received
> a command from osm3s_query.

Thank you for the simulation. Accroding to the information I find about
SIGBUS, this happens if a shared memory file has been deleted. I found
no information that it is related to sockets.

Can you check whether both the shared memory and the socket exist before
the incident? I.e. please run the commands

ls -l $DB_DIR/osm*

ls -l /dev/shm/ | grep osm

ps -ef | grep dispatcher

This should give us an idea of the cause of events.

At mmd: what happens if you comment out line template_db/dispatcher.cc:768
       *(uint32*)dispatcher_shm_ptr = 0;

This should give a clue about the order of events (whether the socket or
the shared memory fails first). A broken pipe for a client if the server
crashes would be expected behaviour. But this does no explain why the
dispatcher has crashed.

BTW: thank you for using the mailing list. This makes answering from the
train a lot easier.

Best regards,

Roland
mmd
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

mmd
Hi Roland,


Am 07.09.2017 um 17:48 schrieb Roland Olbricht:

>
> Thank you for the simulation. According to the information I find about
> SIGBUS, this happens if a shared memory file has been deleted. I found
> no information that it is related to sockets.
>
> Can you check whether both the shared memory and the socket exist before
> the incident? I.e. please run the commands
>

Both shared memory and the unix domain socket are still available after
the resume operation, and seem to be working ok.

According to the following "straces" I collected on both dispatcher and
osm3s_query processes:
- osm3s_query can read from shared memory, and send requests to the
correct unix domain socket without errors,
- the dispatcher correctly receives the data.

I tried the same without suspend/resume and the strace looks pretty much
the same.

The crash location template_db/dispatcher.cc:110 didn't make a lot of
sense to me, so I started looking for similar crash messages. One
notable thing wer  some reports for Java applications triggering SIGBUS,
si_code=BUS_ADRERR, as shown below. They were related to a recent linux
kernel regression, which is supposed to be fixed in the 4.4.0-83 version
I ran my tests on :/

See: https://usn.ubuntu.com/usn/usn-3344-1/

_USN 3328-1 fixed a vulnerability in the Linux kernel. However, that
fix introduced regressions for some Java applications. This update
addresses the issue. We apologize for the inconvenience._

Tested on: Linux version 4.4.0-83-generic (buildd@lgw01-29) (gcc version
5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #106-Ubuntu SMP Mon Jun
26 17:54:43 UTC 2017

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

Let's see if Igor can reproduce the issue...


--------------------------------------------------------------
dispatcher:
--------------------------------------------------------------
accept(3, 0x7ffcb16cad60, 0x7ffcb16cad14) = -1 EAGAIN (Resource
temporarily unavailable)
select(1024, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 5
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
recvfrom(5, "\6\35\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\311\0\0\0", 4, 0, NULL, NULL) = 4
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 6
lseek(6, 0, SEEK_END)                   = 9221
write(6, "2017-09-07 17:04:39 [7279] waite"..., 55) = 55
close(6)                                = 0
recvfrom(5, "\264\0\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0 ", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0\0", 4, 0, NULL, NULL) = 4
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x407fa0} ---
+++ killed by SIGBUS (core dumped) +++



--------------------------------------------------------------
osm3s_query:
--------------------------------------------------------------

rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7efc7dece4b0}, {SIG_DFL, [], 0}, 8) = 0
statfs("/dev/shm/", {f_type="TMPFS_MAGIC", f_bsize=4096,
f_blocks=126996, f_bfree=126994, f_bavail=126994, f_files=126996,
f_ffree=126993, f_fsid={0, 0}, f_namelen=255, f_frsize=4096,
f_flags=38}) = 0
futex(0x7efc7de98310, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/dev/shm/osm3s_v0.7.54_osm_base", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0666, st_size=65, ...}) = 0
mmap(NULL, 65, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7efc7f371000
socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
connect(4, {sa_family=AF_LOCAL,
sun_path="/home/ubuntu/p//osm3s_v0.7.54_osm_base"}, 110) = 0
sendto(4, "\6\35\0\0", 4, 0, NULL, 0)   = 4
open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
read(5,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096)
= 127
lseek(5, -71, SEEK_CUR)                 = 56
read(5,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71
close(5)                                = 0
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 5
lseek(5, 0, SEEK_END)                   = 9165
write(5, "2017-09-07 17:04:39 [7430] reque"..., 56) = 56
close(5)                                = 0
sendto(4, "\311\0\0\0", 4, 0, NULL, 0)  = 4
sendto(4, "\264\0\0\0", 4, 0, NULL, 0)  = 4
sendto(4, "\0\0\0 \0\0\0\0", 8, 0, NULL, 0) = 8
sendto(4, "\0\0\0\0", 4, 0, NULL, 0)    = 4
recvfrom(4, "", 4, 0, NULL, NULL)       = 0
select(1024, NULL, NULL, NULL, {0, 300000}) = 0 (Timeout)
sendto(4, "\311\0\0\0", 4, 0, NULL, 0)  = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=7430,
si_uid=1000} ---
futex(0x7efc7e478680, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 5
lseek(5, 0, SEEK_END)                   = 9276
write(5, "2017-09-07 17:04:40 [7430] Dispa"..., 117) = 117
close(5)                                = 0
write(2, "runtime error: ", 15)         = 15
write(2, "open64: 32 Broken pipe /osm3s_v0"..., 97) = 97
write(2, "\n", 1)                       = 1
exit_group(1)                           = ?
+++ exited with 1 +++


--------------------------------------------------------------
dispatcher stack trace for crash location
--------------------------------------------------------------

#0  Global_Resource_Planner::probe (this=this@entry=0x7ffc56ae5918,
pid=1638, client_token=client_token@entry=0,
time_units=time_units@entry=180, max_space=max_space@entry=536870912)
    at template_db/dispatcher.cc:110
#1  0x000000000040909e in Dispatcher::standby_loop
(this=this@entry=0x7ffc56ae5770, milliseconds=milliseconds@entry=0) at
template_db/dispatcher.cc:659
#2  0x000000000040534f in main (argc=<optimized out>,
argv=0x7ffc56ae5b18) at overpass_api/dispatch/dispatcher_server.cc:472




..

cheers

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
Hi Daniel, hi Igor,

I'm pretty confident that there is a different way to solve the runaway
problem.

As it turns out, fetch_osc_and_apply.sh employs update_database instead
of update_from_dir. update_database does write error messages, but
fetch_osc_and_apply.sh does discard these error messages.

I've ensured in a new release that
- update_database returns an error code on any kind of error
- fetch_osc_and_apply.sh does not suppress error messages any more

On closer inspection, fetch_osc_and_apply.sh has other issues, too. In
particular, it processes every mintue diff individually, which makes
catching-up with updates much (about five times) slower than catching-up
with updates by batch processing.

A little bit of background: the script is the specific script for the
French instance, and I added it to the main repo to avoid unnecessary
fragmentation. I never checked it beyond a basic dry-run if it works, as
I never expected that anybody else would use a script outside the
standard installation instructions.

However, I agree, if a file is in my repositoy then you should can
assume that it works.

Please download and try the fixed release v0.7.54.6:
https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
Thank you Roland, I will try the new release and let you know. 

On Sat, Sep 9, 2017 at 12:26 PM, Roland Olbricht <[hidden email]> wrote:
Hi Daniel, hi Igor,

I'm pretty confident that there is a different way to solve the runaway problem.

As it turns out, fetch_osc_and_apply.sh employs update_database instead of update_from_dir. update_database does write error messages, but fetch_osc_and_apply.sh does discard these error messages.

I've ensured in a new release that
- update_database returns an error code on any kind of error
- fetch_osc_and_apply.sh does not suppress error messages any more

On closer inspection, fetch_osc_and_apply.sh has other issues, too. In particular, it processes every mintue diff individually, which makes catching-up with updates much (about five times) slower than catching-up with updates by batch processing.

A little bit of background: the script is the specific script for the French instance, and I added it to the main repo to avoid unnecessary fragmentation. I never checked it beyond a basic dry-run if it works, as I never expected that anybody else would use a script outside the standard installation instructions.

However, I agree, if a file is in my repositoy then you should can assume that it works.

Please download and try the fixed release v0.7.54.6:
https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

Best regards,

Roland

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
Hi Roland,

Latest results: I installed the new versions of services yesterday, left them running for a few hours and just by checking the two log files produces, they seemed to be running fine (no error entries either in fetch_osc_and_apply.log or transactions.log).

BTW I used --meta=no option for fetch_osc_and_apply.sh (since it's now mandatory to specify the value of this argument).

However, today I decided to check the systemd journal since it looks like a lot of logging is done just to stdout and not the log file, and I found these entries (immediately after the first batch of diff files was downloaded):

sep 10 17:23:48 jazz fetch_osc_and_apply.sh[20388]: 2017-09-10 17:23:48 URL:http://planet.osm.org/replication/minute//002/599/947.state.txt [168/168] -> "/tmp/osm-3s_update_N5f8Ry/002599947.state.txt" [1]
sep 10 17:23:48 jazz fetch_osc_and_apply.sh[20388]: 2017-09-10 17:23:48 URL:http://planet.osm.org/replication/minute//002/599/947.osc.gz [51927/51927] -> "/tmp/osm-3s_update_N5f8Ry/002599947.osc.gz" [1]
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Reading XML file ... finished reading nodes. Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 176084016
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 176090219
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 176101272
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 176103018
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 189122042
sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a later or equal timestamp (0000-00-00T00:00:00Z) than version 0 (0000-00-00T00:00:00Z) of Node 269204750

This then results in following errors:
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1411413259 used in way 7990122 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 3797985553 used in way 7990122 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 5068195841 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 30923267 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 765690030 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 288376716 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 765689879 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 4729032203 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 4729032204 used in way 8091970 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1645753235 used in way 39671787 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1645753228 used in way 39671787 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1327871645 used in way 117968807 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1348966463 used in way 120270858 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1327871645 used in way 127964698 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 5068223552 used in way 182970606 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1933333342 used in way 182970606 not found.
sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1933333328 used in way 182970606 not found.



On Sun, Sep 10, 2017 at 6:35 AM, Igor Brejc <[hidden email]> wrote:
Thank you Roland, I will try the new release and let you know. 

On Sat, Sep 9, 2017 at 12:26 PM, Roland Olbricht <[hidden email]> wrote:
Hi Daniel, hi Igor,

I'm pretty confident that there is a different way to solve the runaway problem.

As it turns out, fetch_osc_and_apply.sh employs update_database instead of update_from_dir. update_database does write error messages, but fetch_osc_and_apply.sh does discard these error messages.

I've ensured in a new release that
- update_database returns an error code on any kind of error
- fetch_osc_and_apply.sh does not suppress error messages any more

On closer inspection, fetch_osc_and_apply.sh has other issues, too. In particular, it processes every mintue diff individually, which makes catching-up with updates much (about five times) slower than catching-up with updates by batch processing.

A little bit of background: the script is the specific script for the French instance, and I added it to the main repo to avoid unnecessary fragmentation. I never checked it beyond a basic dry-run if it works, as I never expected that anybody else would use a script outside the standard installation instructions.

However, I agree, if a file is in my repositoy then you should can assume that it works.

Please download and try the fixed release v0.7.54.6:
https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

Best regards,

Roland


Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
Hi Igor,

thank you for the feedback.

>     sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a
>     later or equal timestamp (0000-00-00T00:00:00Z) than version 0
>     (0000-00-00T00:00:00Z) of Node 176090219

This kind of message is completely harmless. They make sense in the
context of attic data, and they then have proper version numbers.

I will remove these messages if no meta data is present for the next
release. But for the moment I would like to ask you to ignore that message.

>     sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1411413259
>     used in way 7990122 not found.

This kind of message is harmless unless you are importing the complete
planet. Please check on one or two examples if the missing nodes are
indeed outside the zone you have imported, and if the imcomplete ways
are partly inside.

I have no plans to remove that kind of message because it is necessary
to identify occasional problems with the planet wide diffs.

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
Hi Roland,

I am importing the whole planet, the diff is executing on an overpass DB clone.

On Sep 11, 2017 18:10, "Roland Olbricht" <[hidden email]> wrote:
Hi Igor,

thank you for the feedback.

    sep 10 17:23:50 jazz fetch_osc_and_apply.sh[20388]: Version 0 has a
    later or equal timestamp (0000-00-00T00:00:00Z) than version 0
    (0000-00-00T00:00:00Z) of Node 176090219

This kind of message is completely harmless. They make sense in the context of attic data, and they then have proper version numbers.

I will remove these messages if no meta data is present for the next release. But for the moment I would like to ask you to ignore that message.

    sep 10 17:36:51 jazz fetch_osc_and_apply.sh[20388]: Node 1411413259
    used in way 7990122 not found.

This kind of message is harmless unless you are importing the complete planet. Please check on one or two examples if the missing nodes are indeed outside the zone you have imported, and if the imcomplete ways are partly inside.

I have no plans to remove that kind of message because it is necessary to identify occasional problems with the planet wide diffs.

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Roland Olbricht
Hi Igor,

> I am importing the whole planet, the diff is executing on an overpass DB
> clone.

thank you for the notification. After a careful test under conditions as
similar as possible I have identified some further issues that caused
the messages. In particular, some of them might compromise the database.
I'm sorry for that.

Technical detail: due to an oversight in the sorting of elements, it
could have happened that if both meta was turned off and the update
found multiple versions of an object then it could have taken the older
one instead of the younger one. The messages then were from deleted ways
where this sorting has errorneously kept the last version edited some
minutes before.

I suggest to use the now released patch v0.7.54.9:
https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

That version had run successfully here with fetch_osc_and_apply.sh and
--meta=no for about 24 hours. I'm confident that there are no further bugs.

I'm sorry for the inconvenience.

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

Igor Brejc
No problem Roland, I will test it with the new version and let you know how it goes.

Cheers,
Igor

On Wed, Sep 13, 2017 at 7:52 PM, Roland Olbricht <[hidden email]> wrote:
Hi Igor,

I am importing the whole planet, the diff is executing on an overpass DB clone.

thank you for the notification. After a careful test under conditions as similar as possible I have identified some further issues that caused the messages. In particular, some of them might compromise the database. I'm sorry for that.

Technical detail: due to an oversight in the sorting of elements, it could have happened that if both meta was turned off and the update found multiple versions of an object then it could have taken the older one instead of the younger one. The messages then were from deleted ways where this sorting has errorneously kept the last version edited some minutes before.

I suggest to use the now released patch v0.7.54.9:
https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

That version had run successfully here with fetch_osc_and_apply.sh and --meta=no for about 24 hours. I'm confident that there are no further bugs.

I'm sorry for the inconvenience.

Best regards,

Roland

Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

marc marc
In reply to this post by Roland Olbricht
Le 13. 09. 17 à 19:52, Roland Olbricht a écrit :
> I suggest to use the now released patch v0.7.54.9:
> https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz

the archive seems corrupt

$ gunzip osm-3s_v0.7.54.tar.gz
gzip: osm-3s_v0.7.54.tar.gz: invalid compressed data--crc error
gzip: osm-3s_v0.7.54.tar.gz: invalid compressed data--length error

$ du -b osm-3s_v0.7.54.tar.gz
721378 osm-3s_v0.7.54.tar.gz

$ md5sum osm-3s_v0.7.54.tar.gz
76c9a11d17b22a1a0ca44725fdc401ee  osm-3s_v0.7.54.tar.gz
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

rodolphe+osm
In reply to this post by Roland Olbricht
Hello,

Le 14/09/2017 à 14:37, marc marc a écrit :
> Le 13. 09. 17 à 19:52, Roland Olbricht a écrit :
>> I suggest to use the now released patch v0.7.54.9:
>> https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz
Could it be possible to name the tarball file with full version number,
osm-3s_v0.7.54.9.tar.gz instead of osm-3s_v0.7.54.tar.gz?

Regards,
Rodolphe

mmd
Reply | Threaded
Open this post in threaded view
|

Re: apply_osc_to_db not handling dispatcher problems

mmd
In reply to this post by marc marc
Am 14.09.2017 um 14:37 schrieb marc marc:

> Le 13. 09. 17 à 19:52, Roland Olbricht a écrit :
>> I suggest to use the now released patch v0.7.54.9:
>> https://dev.overpass-api.de/releases/osm-3s_v0.7.54.tar.gz
>
> the archive seems corrupt
>
> $ gunzip osm-3s_v0.7.54.tar.gz
> gzip: osm-3s_v0.7.54.tar.gz: invalid compressed data--crc error
> gzip: osm-3s_v0.7.54.tar.gz: invalid compressed data--length error
>
> $ du -b osm-3s_v0.7.54.tar.gz
> 721378 osm-3s_v0.7.54.tar.gz
>
> $ md5sum osm-3s_v0.7.54.tar.gz
> 76c9a11d17b22a1a0ca44725fdc401ee  osm-3s_v0.7.54.tar.gz
>

confirmed, I get the same errors here.

--



12