Difference between revisions of "Calc/Proposal DataPilot byIBM"
(→Analyzing result) |
|||
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | = DataPilot Performance Enhancement I = | ||
+ | |||
{| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0; border: 1px #cccccc solid; border-collapse: collapse; width: 50%" | {| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0; border: 1px #cccccc solid; border-collapse: collapse; width: 50%" | ||
Line 4: | Line 6: | ||
| colspan="2" bgcolor="#cccccc" | '''Specification Status''' | | colspan="2" bgcolor="#cccccc" | '''Specification Status''' | ||
|- | |- | ||
− | | width="150" | '''Author''' || width="150" |[[User:wangxum|Wang Xu Ming]] | + | | width="150" | '''Author''' || width="150" |[[User:wangxum|Wang Xu Ming]], [[User:helenyue|Helen Yue]] |
|- | |- | ||
| width="150" | '''Last Change''' || width ="150"|See wiki history | | width="150" | '''Last Change''' || width ="150"|See wiki history | ||
Line 10: | Line 12: | ||
|} | |} | ||
== Background== | == Background== | ||
− | DataPilot is a critical function to Spreadsheet users. | + | DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4. |
− | In 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed | + | |
− | + | With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus. | |
− | + | During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base. | |
− | == | + | We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work. |
+ | |||
+ | == Test Result == | ||
'''Low performance when update a datapilot table''' | '''Low performance when update a datapilot table''' | ||
− | + | We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source. | |
Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2 | Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2 | ||
− | Below table is the test result to OpenOffice 3.0.0 | + | Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds. |
− | {| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0; border: 1px #cccccc solid; border-collapse: collapse; width: | + | {| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0; border: 1px #cccccc solid; border-collapse: collapse; width: 70%" |
|- | |- | ||
| width="150" bgcolor="#dddddd" | '''Test Scenario'''|| width="150" bgcolor="#dddddd"|'''Open Office 3.0.0''' | | width="150" bgcolor="#dddddd" | '''Test Scenario'''|| width="150" bgcolor="#dddddd"|'''Open Office 3.0.0''' | ||
Line 60: | Line 63: | ||
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash. | Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash. | ||
− | + | == Code Analysis == | |
− | == | + | |
First, create a complex layout DataPilot table, and use rational quantify create a report for its performance. | First, create a complex layout DataPilot table, and use rational quantify create a report for its performance. | ||
[[Image:quantify report.jpg]] | [[Image:quantify report.jpg]] | ||
− | From above table, we can see the top three "F time (% of Focus)" | + | From above table, we can see the top three "F time (% of Focus)" functions: |
* new | * new | ||
Line 72: | Line 74: | ||
* SfxItemSet::== | * SfxItemSet::== | ||
− | Then scan and debug the code, get three | + | Then we scan and debug the code, get three root causes for the performance issue: |
'''Allocate a lot of abundant data''' | '''Allocate a lot of abundant data''' | ||
Line 80: | Line 82: | ||
[[Image:simple dptable.jpg]] | [[Image:simple dptable.jpg]] | ||
− | Member A1 in L1 field will create | + | Member A1 in L1 field will create an array for all members {B1,B2,B3}. But only B1 is visible and valid. |
'''Allocate too much memories''' | '''Allocate too much memories''' | ||
Line 90: | Line 92: | ||
Some borders are set twice or more. | Some borders are set twice or more. | ||
− | == Solution == | + | == Solution Description == |
'''Data Source buffer''' | '''Data Source buffer''' | ||
Line 99: | Line 101: | ||
Then in the output table's algorithm the ScDPItemData structure is replaced by an id. | Then in the output table's algorithm the ScDPItemData structure is replaced by an id. | ||
+ | |||
'''Only allocate visible member''' | '''Only allocate visible member''' | ||
+ | |||
'''Enhance the algorithm of setting border style''' | '''Enhance the algorithm of setting border style''' | ||
+ | |||
+ | |||
+ | [[Category:Calc|Proposal_DataPilot_byIBM]] |
Latest revision as of 10:52, 24 June 2009
Contents
DataPilot Performance Enhancement I
Specification Status | |
Author | Wang Xu Ming, Helen Yue |
Last Change | See wiki history |
Background
DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4.
With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus.
During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base.
We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work.
Test Result
Low performance when update a datapilot table
We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source.
Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2
Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds.
Test Scenario | Open Office 3.0.0 |
Page: 1 Column: 2 Row: 1 Data: 1
Action: - Add Product to Row |
3.15s |
Page: 1 Column: 2 Row: 1 Data: 1
Action:- Product ID to Data |
3.06s |
Page: 1 Column: 3 Row: 3 Data: 1
Action:- Add Product to Row |
25.28s |
Page: 1 Column: 3 Row: 3 Data: 1
Action:- Add Product ID to Data |
44.21s |
Page: 1 Column: 2 Row: 2 Data: 3
Action:- Add SalesRep to Data |
6.28s |
Page: 1 Column: 2 Row: 1 Data: 1
Action:-Change the function of Revenue from Sum to Max |
6.03s |
Crash
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.
Code Analysis
First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.
From above table, we can see the top three "F time (% of Focus)" functions:
- new
- OutputDevice::DrawLine
- SfxItemSet::==
Then we scan and debug the code, get three root causes for the performance issue:
Allocate a lot of abundant data
For a simple datapilot table:
Member A1 in L1 field will create an array for all members {B1,B2,B3}. But only B1 is visible and valid.
Allocate too much memories
Every member's data is stored in a big structure.
Set too many times of border styles for output area
Some borders are set twice or more.
Solution Description
Data Source buffer
A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.
In the buffer, the members of a field can be identified by an id( the sorted index ).
Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
Only allocate visible member
Enhance the algorithm of setting border style