[Patch v1] reduce peak memory usage for global index

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Patch v1] reduce peak memory usage for global index

Gerd Petermann
Hi all,

attached is a patch which seems to help without changing the output of mkgmap.
Compared to r3873 I see ~ 33% less memory usage when used with --x-split-name-index and ~50% less without it.
Runtimes were similar. Actual numbers depend on the style and the used options.

A binary based on 3873 is here:
http://files.mkgmap.org.uk/download/341/mkgmap.jar


It consists of different changes, each of them maybe used independently:
1) always release memory for mdr19, not only if it was filled for the device.
2) check if sort key has zero length, if yes, don't allocate new buffer
3) don't use MultiSortKey for mdr7 if --x-split-name-index is not used
4) count occurences of the generated key strings, cache sort keys for those keys which occur more than 3 times.
5) create smaller copy of byte array if number of allocated bytes was too high

@Steve: Please review, I think 1-4 are okay, 5) may cause problems if my understanding of Sort.fillkey() is wrong.
I was also surprised to see that we require such long sort keys, even with --latin1.
I see many 0 bytes in the created keys, maybe that can be optimized further?

Gerd


_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

mdr-sort-v1.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Patch v1] reduce peak memory usage for global index

Steve Ratcliffe

Hi Gerd,

> 1) always release memory for mdr19, not only if it was filled for the device.
> 2) check if sort key has zero length, if yes, don't allocate new buffer
> 3) don't use MultiSortKey for mdr7 if --x-split-name-index is not used
> 4) count occurences of the generated key strings, cache sort keys for those keys which occur more than 3 times.
> 5) create smaller copy of byte array if number of allocated bytes was too high
>
> @Steve: Please review, I think 1-4 are okay, 5) may cause problems if my understanding of Sort.fillkey() is wrong.

Looks OK.

I agree to add cache in Mdr7, don't know why I omitted it
to begin with.

But does the complex counted cache improve memory use over the simple
case?  As the cache is temporary, after the routine returns there will
be a greater memory use from all the keys that are not de-duplicated.
It seems to me that the temporary memory required during
preWriteImpl() would also be larger unless there are a vast number of
single use keys.

> I was also surprised to see that we require such long sort keys, even with --latin1.
> I see many 0 bytes in the created keys, maybe that can be optimized further?

Its possible, see for example:

   http://www.unicode.org/reports/tr10/#Implementation_Notes

As I was working out how the Srt files worked, I gradually came to
realise that it was pretty much the same way that Java does collation.
I changed the syntax of the resource/sort/*.txt files to more closely
match the language that is used by RuleBasedCollator().  There is also
the icu4j project http://site.icu-project.org/

I can't remember how close either of them were to working as needed
for mkgmap. In the end my code was faster, so I stayed with it.
Its possible that the built in java code could be made to work and
it might be faster now, or perhaps I missed a trick to make it so
at the time.

..Steve

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Reply | Threaded
Open this post in threaded view
|

Re: [Patch v1] reduce peak memory usage for global index

Gerd Petermann
Hi Steve,

thanks. Reg. cache:
I'll check again with a larger set of input tiles. Probably you are right, the more tiles the smaller the percentage of single use keys.
I'm now creating a complete map of Europe for further tests.

Gerd
________________________________________
Von: mkgmap-dev <[hidden email]> im Auftrag von Steve Ratcliffe <[hidden email]>
Gesendet: Freitag, 31. März 2017 22:42:14
An: [hidden email]
Betreff: Re: [mkgmap-dev] [Patch v1] reduce peak memory usage for global        index

Hi Gerd,

> 1) always release memory for mdr19, not only if it was filled for the device.
> 2) check if sort key has zero length, if yes, don't allocate new buffer
> 3) don't use MultiSortKey for mdr7 if --x-split-name-index is not used
> 4) count occurences of the generated key strings, cache sort keys for those keys which occur more than 3 times.
> 5) create smaller copy of byte array if number of allocated bytes was too high
>
> @Steve: Please review, I think 1-4 are okay, 5) may cause problems if my understanding of Sort.fillkey() is wrong.

Looks OK.

I agree to add cache in Mdr7, don't know why I omitted it
to begin with.

But does the complex counted cache improve memory use over the simple
case?  As the cache is temporary, after the routine returns there will
be a greater memory use from all the keys that are not de-duplicated.
It seems to me that the temporary memory required during
preWriteImpl() would also be larger unless there are a vast number of
single use keys.

> I was also surprised to see that we require such long sort keys, even with --latin1.
> I see many 0 bytes in the created keys, maybe that can be optimized further?

Its possible, see for example:

   http://www.unicode.org/reports/tr10/#Implementation_Notes

As I was working out how the Srt files worked, I gradually came to
realise that it was pretty much the same way that Java does collation.
I changed the syntax of the resource/sort/*.txt files to more closely
match the language that is used by RuleBasedCollator().  There is also
the icu4j project http://site.icu-project.org/

I can't remember how close either of them were to working as needed
for mkgmap. In the end my code was faster, so I stayed with it.
Its possible that the built in java code could be made to work and
it might be faster now, or perhaps I missed a trick to make it so
at the time.

..Steve

_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________
mkgmap-dev mailing list
[hidden email]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev