Writer/TOC

From Apache OpenOffice Wiki
< Writer
Revision as of 13:39, 12 June 2012 by Chengjh (Talk | contribs)

Jump to: navigation, search

Table of Contents Improvements

Overall Description

TOC(Table of Contents) is a significant feature in Aoo Writer..Although,it has provided powerful capabilities to benefit end users for productivity, the followed areas,especially the fidelity with MS Word, still need improvements..I propose them and put them as the candidates of the next release.

Descriptions of Main Problems

Loading of MS Word TOC

Binary Format

  • The TOC data of a MS Word document is not parsed completely. And the actual TOC data is from silently updating once a MS Word Document loaded. Thus,the fidelity can not be ensured especially when the document contents that impact TOC have been changed after creating TOC in MS Word.
  • After TOC has been created in MS Word,and then the paragraphs applied with Heading styles are deleted or applied Heading styles un-applied to the paragraphs that have been collected into TOC. Once such MS Word binary document launched into Apache OpenOffice.org Writer, the TOC will disappear.
  • After TOC has been created in MS Word,and then new paragraphs are applied with Heading styles..Once such MS Word binary document launched into Apache OpenOffice.org Writer,new entries will be added to TOC.
  • The tab between chapter number and TOC entry lost when loading a MS Word document,which leads to different gap between chapter number and TOC entry. That looks different from MS Word.

Status

Ongoing

'Function Specification'

Abstract

Give a solution for preserving the TOC contents in DOC files, via interpreting corresponding TOC entries data inside the MS Office Word 2003 .DOC binary format file, instead of the current implementation, in which generating the TOC contents via collected heading line information of the main contents inside the DOC files.

Motivation

The Formal Apache OpenOffice.org do not preserve the exact TOC entries contents via interpreting the TOC entries contents caches stored inside MS Office Word 2003 DOC format files, but generating TOC entries contents depend on collected heading paragraphs contents after loading whole document main contents.

Such TOC loading strategy inside Apache OpenOffice.org leads 3 main issues show as below:

  1. Bad fidelity on representing specified type of MS Word DOC. Considering a MS Word DOC in which contains several heading paragraphs and a TOC. If we delete all the main contents except the TOC and save the document, then reopen the file inside MS Word, the TOC would be the exactly the same as before. But if we open it inside the Apache OpenOffice.org, the TOC will be totally empty;
  2. The manually created/removed TOC entries contents will be lost; Some users of Word would like to add or remove TOC entries manually after generating TOC inside MS Word. Such manual modifications happens on TOC contents will be representing perfectly inside MS word when reopen the DOC files. But, such manual modifications will be lost when loading the DOC files inside Apache OpenOffice.org;
  3. The paragraph/text/field attributes assigned in TOC block will be lost; In some specified TOC generating mode, the paragraph/text/field attributes assigned on a heading paragraph may finally affect the TOC corresponding content entry representing. But in formal Apache OpenOffice.org, such type of TOC inside MS Word Document, will be generated follow the standard TOC entries paragraph/text/field formatting;

Detailed Specification

The original TOC loading process introduction and the improvement of this feature

In the formal Word DOC TOC loading process, there are generally steps of work:

  1. Verifying the exact position of TOC block in the document;
  2. Parsing the TOC field expression and creating the internal TOC model with TOC entries pattern and TOC collecting rules accordingly;
  3. Jump over the TOC field representation cache part;
  4. Jump over the TOC field representation cache range corresponding paragraph/text/field attributes;
  5. Collecting the heading paragraphs while loading the main contents of the document depend on said collecting rules;
  6. Generating the TOC entries depend on said TOC entries pattern;

In this MS Word DOC filter improvement focus on TOC contents cache, we will give following strategy changes:

  • Heading paragraphs collecting step removal, indicate the step 5 above;
  • TOC generating/updating step removal, indicate the step 6 above;
  • TOC contents cache parsing step addition, expand the step 3 above;
  • TOC contents cache range corresponding paragraph/text/field attributes parsing, expend the step 4 above;

The behavioral difference leads by this improvement

This section is described by a user scenarios table.


#
Scenario Description
Comment
1
In Apache OpenOffice.org with this improvement, open MS Word DOC document that:
  • Has had several heading paragraphs and corresponding generated TOC inside;
  • All the main contents except the TOC are deleted;

Result:

The TOC contents cache preserved;

In further specified cases, some modifications may happens to the main contents, but the TOC was not updated before saving. In the formal Apache OpenOffice.org, loaded TOC will always keep accordance exactly with the main contents/heading paragraphs. With this feature, we just preserve the TOC contents cache recorded in the DOC document anyway.
2
In Apache OpenOffice.org with this improvement, open MS Word DOC document that:
  • Has a generated TOC inside;
  • The TOC block was modified manually by user, such as inserted new paragraphs, or(and) deleted paragraphs;

Result:

The user manually modifications happened on TOC are preserved;

In some special manually modified TOC cases, the TOC formatting result may be not as good as the generated one.
3
In Apache OpenOffice.org with this improvement, open MS Word DOC document that:
  • Has several heading paragraphs and special text attributes such as strikethrough line and outline font and font color applied on whole or part of current paragraph;
  • Has a corresponding generated TOC inside;
    Result:
    The text attributes applied onto TOC entries accordingly;


Impact on Import/Export filters

Support of the new paragraph style's List Level attribute in import/export filter for the following file formats:

  • Microsoft Word binary format (WW8)

Design Description


OOXML Format

Same with binary file format.


Customized Formats of TOC Entry

Binary Format

The customized character attributes will be lost when loading MS Word TOC created by un-checking "Use hyperlinks instead of page numbers". To this kind of TOC,the customized character attributes of the target paragraphs can be collected into TOC in MS Word.

OOXML Format

Same with binary file format.


Export TOC to MS Word

Binary Format

  • Saving MS Word Binary Format Back

The width of tab between chapter numbering and TOC entry will be changed.

  • Saving ODT to MS Word Binary

The jumping hyperlink info will be lost when exporting odt TOC to MS Word binary TOC.

OOXML Format


TOC Jumping with Page Numbers Only

Jump info will be lost when loading MS Word TOC created by un-checking "Use hyperlinks instead of page numbers". To this kind of TOC,end users can only press ctrl+mouse to click the page number of the TOC entry for jumping in MS Word.


Accessibility

The current TOC dialog can not meet the accessibility requirements.


Usability

The current TOC dialog is difficult for end users to understand and use..Most end users can just only create a TOC by default, confusing to customize the attributes and styles.

Function Specification

Solution

Comments

Personal tools