v0.7.54.9 breaks down

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

v0.7.54.9 breaks down

Igor Brejc
Hi,

I've installed the new version and let the diffs run. It ran fine for a few hours, but then it started reporting errors. Situation:
  • both systemd services (the main dispatcher and the diff) are still (nominally) running (
  • both processes are running
  • the diffs are no longer being applied
  • the HTTP querying no longer works ("Error: runtime error: The dispatcher (i.e. the database management system) is turned off".)

Here's the extract of the logs from that moment:

sep 17 01:36:57 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:57 URL:http://planet.osm.org/replication/minute//002/604/991.state.txt [179/179] -> "/tmp/osm-3s_update_E8yees/002604991.state.txt" [1]
sep 17 01:36:58 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:58 URL:http://planet.osm.org/replication/minute//002/604/991.osc.gz [86107/86107] -> "/tmp/osm-3s_update_E8yees/002604991.osc.gz" [1]
sep 17 01:36:58 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:58 URL:http://planet.osm.org/replication/minute//002/604/992.state.txt [158/158] -> "/tmp/osm-3s_update_E8yees/002604992.state.txt" [1]
sep 17 01:36:59 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:59 URL:http://planet.osm.org/replication/minute//002/604/992.osc.gz [39802/39802] -> "/tmp/osm-3s_update_E8yees/002604992.osc.gz" [1]
sep 17 01:36:59 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:59 URL:http://planet.osm.org/replication/minute//002/604/993.state.txt [168/168] -> "/tmp/osm-3s_update_E8yees/002604993.state.txt" [1]
sep 17 01:36:59 jazz fetch_osc_and_apply.sh[15623]: 2017-09-17 01:36:59 URL:http://planet.osm.org/replication/minute//002/604/993.osc.gz [39586/39586] -> "/tmp/osm-3s_update_E8yees/002604993.osc.gz" [1]
sep 17 01:36:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:37:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:38:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:39:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:40:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:41:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:42:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1
sep 17 01:43:59 jazz fetch_osc_and_apply.sh[15623]: File error caught: 2 No such file or directory /osm3s_v0.7.54_osm_base Dispatcher_Client::1

I've checked, osm3s_v0.7.54_osm_base exists:

igor@jazz:~/overpass/db$ ls -l osm3s_v0.7.54_osm_base
srw-rw-rw- 1 igor igor 0 sep 16 20:22 osm3s_v0.7.54_osm_base


mmd
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

mmd
Hi,

Am 17.09.2017 um 07:52 schrieb Igor Brejc:


> I've installed the new version and let the diffs run. It ran fine for a
> few hours, but then it started reporting errors. Situation:
>
> Here's the extract of the logs from that moment:
>

>     URL:http://planet.osm.org/replication/minute//002/604/993.osc.gz
>     [39586/39586] -> "/tmp/osm-3s_update_E8yees/002604993.osc.gz" [1]
>     sep 17 01:36:59 jazz fetch_osc_and_apply.sh[15623]: File error
>     caught: 2 No such file or directory /osm3s_v0.7.54_osm_base
>     Dispatcher_Client::1

>
> I've checked, osm3s_v0.7.54_osm_base exists:
>
>     igor@jazz:~/overpass/db$ ls -l osm3s_v0.7.54_osm_base
>     srw-rw-rw- 1 igor igor 0 sep 16 20:22 osm3s_v0.7.54_osm_base
>
>
>


The error message basically says that the shared memory file created by
the dispatcher process is no longer there. This file contains some
information about the database directory and the shadow files. Without
it, an interpreter or update_* process has no idea where the database is
and has to stop further processing.

Can you please post the output of:

ls -l /run/shm/osm3s_*

Usually, you could inspect those files as well, but I guess in your case
they're no longer around?

od -c /run/shm/osm3s_v0.7.54_osm_base

Which should return something like:

0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 016  \0  \0  \0
0000020   /   s   r   v   /   o   s   m   3   s   /   d   b   / 035  \0
0000040  \0  \0   /   s   r   v   /   o   s   m   3   s   /   d   b   /
0000060   o   s   m   _   b   a   s   e   _   s   h   a   d   o   w
0000077



Thanks!




mmd
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

mmd
Am 17.09.2017 um 09:13 schrieb mmd:

>
> The error message basically says that the shared memory file created by
> the dispatcher process is no longer there. This file contains some
> information about the database directory and the shadow files. Without
> it, an interpreter or update_* process has no idea where the database is
> and has to stop further processing.
>
> Can you please post the output of:
>
> ls -l /run/shm/osm3s_*
>

One more thing: when running lsof -p 20704  (20704 being the dispatcher
process id, as reported by "pidof dispatcher"), you should see what
happened to this file.

I just manually deleted that shared memory file and it now shows in lsof
as "(deleted)".

dispatche 20704  mmd    4u      REG               0,21       91      100
/dev/shm/osm3s_v0.7.54_osm_base (deleted)


--





Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

Igor Brejc
In reply to this post by mmd
On Sun, Sep 17, 2017 at 9:13 AM, mmd <[hidden email]> wrote:
Can you please post the output of:

ls -l /run/shm/osm3s_*

There are no osm* files in that directory.

One more thing: when running lsof -p 20704  (20704 being the dispatcher
process id, as reported by "pidof dispatcher"), you should see what
happened to this file.

Looks like it was deleted. What could be the cause of this deletion? 
 
igor@jazz:~/overpass$ lsof -p 15614
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
      Output information may be incomplete.
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
dispatche 15614 igor  cwd    DIR               8,37     4096        2 /
dispatche 15614 igor  rtd    DIR               8,37     4096        2 /
dispatche 15614 igor  txt    REG               8,38  3754064 22806936 /home/igor/overpass/osm-3s_v0.7.54/bin/dispatcher
dispatche 15614 igor  mem    REG               8,37  1088952   267062 /lib/x86_64-linux-gnu/libm-2.23.so
dispatche 15614 igor  mem    REG               8,37   138696   266750 /lib/x86_64-linux-gnu/libpthread-2.23.so
dispatche 15614 igor  mem    REG               8,37  1868984   267164 /lib/x86_64-linux-gnu/libc-2.23.so
dispatche 15614 igor  mem    REG               8,37    89696   267054 /lib/x86_64-linux-gnu/libgcc_s.so.1
dispatche 15614 igor  mem    REG               8,37  1566440  1450555 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
dispatche 15614 igor  mem    REG               8,37    31712   266935 /lib/x86_64-linux-gnu/librt-2.23.so
dispatche 15614 igor  mem    REG               8,37   162632   266745 /lib/x86_64-linux-gnu/ld-2.23.so
dispatche 15614 igor  DEL    REG               0,21           5585078 /dev/shm/osm3s_v0.7.54_osm_base
dispatche 15614 igor    0r   CHR                1,3      0t0     1029 /dev/null
dispatche 15614 igor    1u  unix 0x0000000000000000      0t0  5576366 type=STREAM
dispatche 15614 igor    2u  unix 0x0000000000000000      0t0  5576366 type=STREAM
dispatche 15614 igor    3u  unix 0x0000000000000000      0t0  5585077 /home/igor/overpass/db//osm3s_v0.7.54_osm_base type=STREAM
dispatche 15614 igor    4u   REG               0,21       81  5585078 /dev/shm/osm3s_v0.7.54_osm_base (deleted)

mmd
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

mmd
Am 17.09.2017 um 17:55 schrieb Igor Brejc:

> On Sun, Sep 17, 2017 at 9:13 AM, mmd
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Can you please post the output of:
>
>     ls -l /run/shm/osm3s_*
>
>
> There are no osm* files in that directory.
>
>     One more thing: when running lsof -p 20704  (20704 being the dispatcher
>     process id, as reported by "pidof dispatcher"), you should see what
>     happened to this file.
>
>
> Looks like it was deleted. What could be the cause of this deletion? 
>  
>

I found something interesting here, systemd seems to be the culprit:

https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449



--



Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

Igor Brejc
Nice catch! It makes sense, because the deletion occurred during the time I was logged off the machine (and those two systemd units run as that same, non-system account). I have configured that configuration setting that was mentioned in the solution, I'll restart the service from scratch and let you know.

Thank you,
Igor

On Sun, Sep 17, 2017 at 6:30 PM, mmd <[hidden email]> wrote:
Am 17.09.2017 um 17:55 schrieb Igor Brejc:
> On Sun, Sep 17, 2017 at 9:13 AM, mmd
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Can you please post the output of:
>
>     ls -l /run/shm/osm3s_*
>
>
> There are no osm* files in that directory.
>
>     One more thing: when running lsof -p 20704  (20704 being the dispatcher
>     process id, as reported by "pidof dispatcher"), you should see what
>     happened to this file.
>
>
> Looks like it was deleted. What could be the cause of this deletion? 
>  
>

I found something interesting here, systemd seems to be the culprit:

https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449



--




Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

marc marc
Hello,

It look like this is the root cause of the crash of osm-fr.
we also use systemd, daemon also run using a not-system account
and we are also not connected to this account where /dev/shm was removed
we 'll move it to system account like it should be for daemon.

PS: can you fix the missing dirname in the error message ?

Regards,
Marc

Le 17. 09. 17 à 18:51, Igor Brejc a écrit :

> Nice catch! It makes sense, because the deletion occurred during the
> time I was logged off the machine (and those two systemd units run as
> that same, non-system account). I have configured that configuration
> setting that was mentioned in the solution, I'll restart the service
> from scratch and let you know.
>
> Thank you,
> Igor
>
> On Sun, Sep 17, 2017 at 6:30 PM, mmd <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Am 17.09.2017 um 17:55 schrieb Igor Brejc:
>     > On Sun, Sep 17, 2017 at 9:13 AM, mmd wrote:
>     >
>     >     Can you please post the output of:
>     >
>     >     ls -l /run/shm/osm3s_*
>     >
>     >
>     > There are no osm* files in that directory.
>     >
>     >     One more thing: when running lsof -p 20704  (20704 being the dispatcher
>     >     process id, as reported by "pidof dispatcher"), you should see what
>     >     happened to this file.
>     >
>     >
>     > Looks like it was deleted. What could be the cause of this deletion?
>     >  
>     >
>
>     I found something interesting here, systemd seems to be the culprit:
>
>     https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449
>     <https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449>
>
>
>
>     --
>
>
>
>

mmd
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

mmd
Am 17.09.2017 um 20:08 schrieb marc marc:

>
> It look like this is the root cause of the crash of osm-fr.
> we also use systemd, daemon also run using a not-system account
> and we are also not connected to this account where /dev/shm was removed
> we 'll move it to system account like it should be for daemon.

Good to hear, let's see if there are no further issues.

>
> PS: can you fix the missing dirname in the error message ?

Well, this is really OS specific. On Linux, those shared memory segments
end up as a directory entry in /run/shm or /dev/shm, but e.g. on FreeBSD
there's no equivalent to it.

"/osm3s_v0.7.54_osm_base" in the following error message actually refers
to a name of a shared memory segment, which has been created via
shm_open, rather than some 'normal' file.

"runtime error: open64: 2 No such file or directory
/osm3s_v0.7.54_osm_base Dispatcher_Client::1"

--


Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

Roland Olbricht
In reply to this post by mmd
Hi mmd,

> I found something interesting here, systemd seems to be the culprit:
>
> https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449

thank you for figuring this out. I have left a warning on the wiki page.

Best regards,

Roland
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

marc marc
In reply to this post by mmd
Le 17. 09. 17 à 20:38, mmd a écrit :

>> PS: can you fix the missing dirname in the error message ?
>
> Well, this is really OS specific. On Linux, those shared memory segments
> end up as a directory entry in /run/shm or /dev/shm, but e.g. on FreeBSD
> there's no equivalent to it.
>
> "/osm3s_v0.7.54_osm_base" in the following error message actually refers
> to a name of a shared memory segment, which has been created via
> shm_open, rather than some 'normal' file.
>
> "runtime error: open64: 2 No such file or directory
> /osm3s_v0.7.54_osm_base Dispatcher_Client::1"
>

in this case, maybe add "share memory" as "prefix" to the path
because a regular file exist with the same name.
this is confusing when it is not known whether the error is for the
regular file or the shared memory pseudo-file

Regards,
MArc
Reply | Threaded
Open this post in threaded view
|

Re: v0.7.54.9 breaks down

Igor Brejc
In reply to this post by Igor Brejc
Hi,

Just to let you know the services are running for 24 hours without problems. I've tried logging off, pm-suspending it and I didn't notice any problems. Nice work!

Cheers,
Igor

On Sun, Sep 17, 2017 at 6:51 PM, Igor Brejc <[hidden email]> wrote:
Nice catch! It makes sense, because the deletion occurred during the time I was logged off the machine (and those two systemd units run as that same, non-system account). I have configured that configuration setting that was mentioned in the solution, I'll restart the service from scratch and let you know.

Thank you,
Igor

On Sun, Sep 17, 2017 at 6:30 PM, mmd <[hidden email]> wrote:
Am 17.09.2017 um 17:55 schrieb Igor Brejc:
> On Sun, Sep 17, 2017 at 9:13 AM, mmd
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Can you please post the output of:
>
>     ls -l /run/shm/osm3s_*
>
>
> There are no osm* files in that directory.
>
>     One more thing: when running lsof -p 20704  (20704 being the dispatcher
>     process id, as reported by "pidof dispatcher"), you should see what
>     happened to this file.
>
>
> Looks like it was deleted. What could be the cause of this deletion? 
>  
>

I found something interesting here, systemd seems to be the culprit:

https://askubuntu.com/questions/884127/16-04-lts-and-dev-shm-files-disappearing/884449



--