Difference between revisions of "Calc/Proposal DataPilot byIBM"
Line 105: | Line 105: | ||
'''Enhance the algorithm of setting border style''' | '''Enhance the algorithm of setting border style''' | ||
+ | |||
+ | |||
+ | [[Category:Calc|Proposal_DataPilot_byIBM]] |
Latest revision as of 10:52, 24 June 2009
Contents
DataPilot Performance Enhancement I
Specification Status | |
Author | Wang Xu Ming, Helen Yue |
Last Change | See wiki history |
Background
DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4.
With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus.
During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base.
We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work.
Test Result
Low performance when update a datapilot table
We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source.
Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2
Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds.
Test Scenario | Open Office 3.0.0 |
Page: 1 Column: 2 Row: 1 Data: 1
Action: - Add Product to Row |
3.15s |
Page: 1 Column: 2 Row: 1 Data: 1
Action:- Product ID to Data |
3.06s |
Page: 1 Column: 3 Row: 3 Data: 1
Action:- Add Product to Row |
25.28s |
Page: 1 Column: 3 Row: 3 Data: 1
Action:- Add Product ID to Data |
44.21s |
Page: 1 Column: 2 Row: 2 Data: 3
Action:- Add SalesRep to Data |
6.28s |
Page: 1 Column: 2 Row: 1 Data: 1
Action:-Change the function of Revenue from Sum to Max |
6.03s |
Crash
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.
Code Analysis
First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.
From above table, we can see the top three "F time (% of Focus)" functions:
- new
- OutputDevice::DrawLine
- SfxItemSet::==
Then we scan and debug the code, get three root causes for the performance issue:
Allocate a lot of abundant data
For a simple datapilot table:
Member A1 in L1 field will create an array for all members {B1,B2,B3}. But only B1 is visible and valid.
Allocate too much memories
Every member's data is stored in a big structure.
Set too many times of border styles for output area
Some borders are set twice or more.
Solution Description
Data Source buffer
A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.
In the buffer, the members of a field can be identified by an id( the sorted index ).
Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
Only allocate visible member
Enhance the algorithm of setting border style