Refactoring Calc Area Broadcasters

From Apache OpenOffice Wiki
Jump to: navigation, search

Performance 170.png
Performance Project

Quick Navigation




About this template

Because of the unique sorted set requirement and the underlying tree implementation of ::std::set, inserting area broadcasters with identical top left addresses to large sets takes a lot of time. Refactoring that to use a different structure.

The change is in CWS DEV300 calcperf04  

Layout of the old broadcaster area implementation:

  • A document is divided into slots.
    • Each slot represents 128 rows by 16 columns of all sheets.
  • Each cell listening to a range registers the range with the broadcaster slots such that
    • the range is distributed over slots it intersects with.
    • each slot contains pointers to refcounted ranges that intersect with the slot's range.
  • Each slot holds the pointers in a ::std::set unique sorted associative container.
  • A cell change, additionally to the directly listening cells, is broadcasted by looking up the corresponding slot and broadcasted to all ranges it it is contained in.

Issue 101254 has two test case documents attached:

  1. broadcast_slots_single.ods with one sheet and 64k ranges to listen to, all different but starting at A1.
  2. broadcast_slots_double.ods with two sheets and 128k ranges to listen to (64k each sheet), all different but starting at A1 of the containing sheet.

These stress the broadcaster slots implementation such that for broadcast_slots_single.ods

  • The first slot for A1:P128 holds 65536 elements.
  • The second slot for A129:P256 holds (65536-128)=65408 elements.
  • ...
  • The slot for A65408:P65536 holds 128 elements.

For broadcast_slots_double.ods

  • The first slot for A1:P128 holds 131072 elements.
  • The second slot for A129:P256 holds (131072-256)=130816 elements.
  • ...
  • The slot for A65408:P65536 holds 256 elements.

The new inplementation introduces two differences:

  • A ::std::hash_set unique associative container is used instead of ::std::set.
    • This greatly improves insertion time and memory allocation.
  • The slot arrangement is per sheet instead of document-wide.
    • Broadcaster areas on different sheets do not clutter up.
      • In the case of broadcast_slots_double.ods two slot arrangements are used, with half the number of elements each, e.g. 65536 elements in each A1:P128 slot instead of 131072 elements.
    • When broadcasting a cell change, only the areas listening to that sheet's slot need to be iterated.

Measurements, document load after having started a fresh soffice -calc to have the Calc library code already loaded, and recalculation after changing A1 on one sheet, gave the following numbers: CPU time used (not wall clock), virtual memory size (VSZ) and resident memory size (RSS).

Measurement CPU CPU% VSZ VSZ% RSS RSS%
load single old 22s 100% 620396 100% 493369 100%
load single new 14s 63% 520717 84% 368736 75%
load double old 43s 100% 1074909 100% 911856 100%
load double new 24s 55% 831717 77% 664847 73%
double/single old 194% 173% 185%
double/single new 171% 160% 180%
recalc single old 554s 100%
recalc single new 274s 49%
recalc double old 1123s 100%
recalc double new 329s 29%
double/single old 203%
double/single new 120%

Additionally, closing the document with the new implementation is much faster, for the double document it needs maybe 1 second, with the old implementation it was ~10 seconds.

Btw, creating the same single document in Excel took 15-20 minutes, saved as MOOXML and loaded again it needed ~30 minutes. Didn't try with the double document..

Personal tools