Refactoring Calc Area Broadcasters
|
---|
Quick Navigation Team Communication Activities |
About this template |
Because of the unique sorted set requirement and the underlying tree
implementation of ::std::set, inserting area broadcasters with identical top
left addresses to large sets takes a lot of time. Refactoring that to use
a different structure.
The change is in CWS DEV300 calcperf04
Layout of the old broadcaster area implementation:
- A document is divided into slots.
- Each slot represents 128 rows by 16 columns of all sheets.
- Each cell listening to a range registers the range with the broadcaster slots such that
- the range is distributed over slots it intersects with.
- each slot contains pointers to refcounted ranges that intersect with the slot's range.
- Each slot holds the pointers in a ::std::set unique sorted associative container.
- A cell change, additionally to the directly listening cells, is broadcasted by looking up the corresponding slot and broadcasted to all ranges it it is contained in.
Issue 101254 has two test case documents attached:
- broadcast_slots_single.ods with one sheet and 64k ranges to listen to, all different but starting at A1.
- broadcast_slots_double.ods with two sheets and 128k ranges to listen to (64k each sheet), all different but starting at A1 of the containing sheet.
These stress the broadcaster slots implementation such that for broadcast_slots_single.ods
- The first slot for A1:P128 holds 65536 elements.
- The second slot for A129:P256 holds (65536-128)=65408 elements.
- ...
- The slot for A65408:P65536 holds 128 elements.
For broadcast_slots_double.ods
- The first slot for A1:P128 holds 131072 elements.
- The second slot for A129:P256 holds (131072-256)=130816 elements.
- ...
- The slot for A65408:P65536 holds 256 elements.
The new inplementation introduces two differences:
- A ::std::hash_set unique associative container is used instead of ::std::set.
- This greatly improves insertion time and memory allocation.
- The slot arrangement is per sheet instead of document-wide.
- Broadcaster areas on different sheets do not clutter up.
- In the case of broadcast_slots_double.ods two slot arrangements are used, with half the number of elements each, e.g. 65536 elements in each A1:P128 slot instead of 131072 elements.
- When broadcasting a cell change, only the areas listening to that sheet's slot need to be iterated.
- Broadcaster areas on different sheets do not clutter up.
Measurements, document load after having started a fresh soffice -calc
to have the Calc library code already loaded, and recalculation after changing A1 on one sheet, gave the following numbers: CPU time used (not wall clock), virtual memory size (VSZ) and resident memory size (RSS).
Measurement | CPU | CPU% | VSZ | VSZ% | RSS | RSS% |
---|---|---|---|---|---|---|
load single old | 22s | 100% | 620396 | 100% | 493369 | 100% |
load single new | 14s | 63% | 520717 | 84% | 368736 | 75% |
load double old | 43s | 100% | 1074909 | 100% | 911856 | 100% |
load double new | 24s | 55% | 831717 | 77% | 664847 | 73% |
double/single old | 194% | 173% | 185% | |||
double/single new | 171% | 160% | 180% | |||
recalc single old | 554s | 100% | ||||
recalc single new | 274s | 49% | ||||
recalc double old | 1123s | 100% | ||||
recalc double new | 329s | 29% | ||||
double/single old | 203% | |||||
double/single new | 120% |
Additionally, closing the document with the new implementation is much faster, for the double document it needs maybe 1 second, with the old implementation it was ~10 seconds.
Btw, creating the same single document in Excel took 15-20 minutes, saved as MOOXML and loaded again it needed ~30 minutes. Didn't try with the double document..