import of Belgium extract failed

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

import of Belgium extract failed

marc marc
Hello,

To test a overpass-api install, I have selected Belgium extract
https://download.geofabrik.de/europe/belgium-latest.osm.bz2
file timestamp is sep 16 12:24 (UTC+2)

I populate the database with /usr/local/bin/init_osm3s.sh

it failed with this message
Reading XML file ...terminate called after throwing an instance of
'std::bad_alloc'
   what():  std::bad_alloc

I reduced the problem to this (with or without --meta)
$ cat belgium-latest.osm | update_database
--db-dir=/data/work/overpass/database

I do not know if it's a bug in update_database binary
or if it's an error in the geofabrik file (in this case,
a better error message could be usefull)

I keep geofabrik file if needed to help.

Regards,
Marc
mmd
Reply | Threaded
Open this post in threaded view
|

Re: import of Belgium extract failed

mmd
Am 17.09.2017 um 01:15 schrieb marc marc:

>
> To test a overpass-api install, I have selected Belgium extract
> https://download.geofabrik.de/europe/belgium-latest.osm.bz2
> file timestamp is sep 16 12:24 (UTC+2)
>
> I populate the database with /usr/local/bin/init_osm3s.sh
>
> it failed with this message
> Reading XML file ...terminate called after throwing an instance of
> 'std::bad_alloc'
>    what():  std::bad_alloc
>

Usually that's an indication that update_from_dir or update_database
tried to allocate more main memory than available on the system. On all
production and dev systems we're running with 32GB main memory, which is
sufficient to import a planet. IIRC the French instance only runs with
16GB, which really may be not sufficient after all. Do you have a chance
to assign more memory to this instance?

For an initial import, there's an option to add a --flush-size
parameter, see
http://wiki.openstreetmap.org/wiki/Overpass_API/Installation#Database_population_problem
- although I would use a higher value than --flush-size=1 as it will be
painfully slow otherwise.

--



Reply | Threaded
Open this post in threaded view
|

Re: import of Belgium extract failed

marc marc
Hello,

Le 17. 09. 17 à 09:20, mmd a écrit :

> Am 17.09.2017 um 01:15 schrieb marc marc:
>> it failed with this message
>> Reading XML file ...terminate called after throwing an instance of
>> 'std::bad_alloc'
>>     what():  std::bad_alloc
> Usually that's an indication that update_from_dir or update_database
> tried to allocate more main memory than available on the system.
>
> For an initial import, there's an option to add a --flush-size
> parameter, see
> http://wiki.openstreetmap.org/wiki/Overpass_API/Installation#Database_population_problem
> - although I would use a higher value than --flush-size=1 as
> it will be painfully slow otherwise.

I tried with --flush-size=6 and the import succeeded

what is the exact meaning of --flush-size = 1 ? a flush after each
modified node/way/relation ? after each tag ? each changeset ?

Is it a "formula" between flush-size and maximum consumed ram ?
I imagine something like x Mo + x Mo x flush-size = max malloc ram

Regards,
Marc
mmd
Reply | Threaded
Open this post in threaded view
|

Re: import of Belgium extract failed

mmd
Hi,

Am 20.09.2017 um 22:12 schrieb marc marc:

>
> I tried with --flush-size=6 and the import succeeded
>
> what is the exact meaning of --flush-size = 1 ? a flush after each
> modified node/way/relation ? after each tag ? each changeset ?
>
> Is it a "formula" between flush-size and maximum consumed ram ?
> I imagine something like x Mo + x Mo x flush-size = max malloc ram
>

update_database.cc has the following definitions:

Default value: flush_limit = 16*1024*1024

When setting --flush-size=6 as a parameter, flush_limit will be set to
6*1024*1024.

Very roughly speaking, when importing nodes/ways/relations, the number
of objects will be counted, and once the flush-size threshold is
reached, objects will be flushed to disk and the counter reset to zero.
As objects have varying sizes, this is a rough approximation of the
actual memory requirements.

The smaller the flush size value, the more ofter you need to flush to
disk, and possibly read referenced objects (especially nodes) more often
from disk - which would otherwise still be in memory. Setting it close
to the maximum value your memory permits reduces the overall runtime of
the initial load.


--




Reply | Threaded
Open this post in threaded view
|

Re: import of Belgium extract failed flush-size <> ram

marc marc
Le 20. 09. 17 à 22:28, mmd a écrit :

>> I tried with --flush-size=6 and the import succeeded
>>
>> what is the exact meaning of --flush-size = 1 ? a flush after each
>> modified node/way/relation ? after each tag ? each changeset ?
>>
>> Is it a "formula" between flush-size and maximum consumed ram ?
>> I imagine something like x Mo + x Mo x flush-size = max malloc ram
>
> update_database.cc has the following definitions:
>
> Default value: flush_limit = 16*1024*1024
>
> When setting --flush-size=6 as a parameter, flush_limit will be set to
> 6*1024*1024.
>
> Very roughly speaking, when importing nodes/ways/relations, the number
> of objects will be counted, and once the flush-size threshold is
> reached, objects will be flushed to disk and the counter reset to zero.
> As objects have varying sizes, this is a rough approximation of the
> actual memory requirements.
>
> Setting it close to the maximum value your memory permits reduces
> the overall runtime of the initial load.

yes of course, it is what I 'm trying todo : found the hightest value to
speedup the initiale load AND having a value that avoid a out of memory.

But due to that flush-size is based on the number of objects and not
their size, it is difficult to prevent an out of memory when
a change occurs on a lot of "bigger than average" objects.
is it possible that flush-size uses the actual size of the malloc
instead of the number of objects ?
or is this option too little used to make it important?

Regards,
Marc
mmd
Reply | Threaded
Open this post in threaded view
|

Re: import of Belgium extract failed flush-size <> ram

mmd
Am 20.09.2017 um 23:45 schrieb marc marc:

> is it possible that flush-size uses the actual size of the malloc
> instead of the number of objects ?

That's not so easy due to the automatic memory management by the C++ STL
containers. Memory is usually allocated in much larger chunks rather
than each time you add an entry to a container for obvious performance
reasons.

Roland did in fact implement a memory consumption check to determine the
minimum required memory for all data in the inputsets. That's part of a
health check during query execution. Actual memory allocation towards
the OS may still be a bit larger depending on the STL memory management
strategy.

I guess it wasn't worth the effort to do so for the initial import as
well, as it never turned out to be an issue on 32GB machines. :)

--