Difference between revisions of "SpreadsheetML"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Development)
(Code Organization: added code locations in sc module.)
Line 15: Line 15:
 
= Sample Files =  
 
= Sample Files =  
 
One convenient location to download sample files is in [http://svn.gnome.org/viewcvs/gnumeric/trunk/samples/excel12/ gnumeric repository].
 
One convenient location to download sample files is in [http://svn.gnome.org/viewcvs/gnumeric/trunk/samples/excel12/ gnumeric repository].
 +
 +
= Implementation Strategy =
 +
Eventually, the code handling SpreadsheetML import will merge with the existing Excel binary filter.  As such, the new code needs to be designed with this in mind.  It's always desirable to understand how the existing binary filter works when implementing the XML filter to make the future merging work less painful.
  
 
= Code Organization =
 
= Code Organization =
Line 24: Line 27:
  
 
The term '''workbook''' in this context refers to an entire document which includes worksheets and other document metadata, whereas the term '''worksheet''' refers to each individual sheet in the workbook.
 
The term '''workbook''' in this context refers to an entire document which includes worksheets and other document metadata, whereas the term '''worksheet''' refers to each individual sheet in the workbook.
 +
 +
The existing binary Excel filter is located in '''sc/source/filter/(inc|excel)/xi(page|view).(c|h)xx'''.  The '''XclImpTabViewSettings''' class handles importing sheet's view settings, which corresponds to worksheet/sheetViews context in the XML format.
 +
 +
The UNO interface code is found in '''sc/source/ui/uno'''.
  
 
== Global data ==
 
== Global data ==

Revision as of 17:00, 16 March 2007


SpreadsheetML is the XML format used by Microsoft Excel 2007 and that is part of the Office Open XML specification.

SpreadsheetML Basics

workbook

A SpreadsheetML document is described at the top level by a workbook part ( /xl/workbook.xml ). The type of the work book type is

 http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument

The workbook part contains document's metadata and one or more sheets. Each sheet can be a worksheet, a chart sheet, or a dialog sheet.

Both strings and formulas are stored in shared tables to avoid redundant storage and speed file I/O's.

Sample Files

One convenient location to download sample files is in gnumeric repository.

Implementation Strategy

Eventually, the code handling SpreadsheetML import will merge with the existing Excel binary filter. As such, the new code needs to be designed with this in mind. It's always desirable to understand how the existing binary filter works when implementing the XML filter to make the future merging work less painful.

Code Organization

Source files for handling the SpreadsheetML format are located in inc/oox/xls and source/xls under module oox. A good place to start tracing the code would be ExcelFilter::Import and follow the calls it makes.

A substream in the XML package is called "fragment", and each fragment has an associated *fragment.hxx header file. For instance, the code for loading of the workbook.xml fragment is found in workbookfragment.hxx, and so on.

A nested element is called "context", and, like the fragments, each context has an associated *context.hxx. For instance, the code for parsing the <sheetData> element is found in sheetdatacontext.hxx.

The term workbook in this context refers to an entire document which includes worksheets and other document metadata, whereas the term worksheet refers to each individual sheet in the workbook.

The existing binary Excel filter is located in sc/source/filter/(inc|excel)/xi(page|view).(c|h)xx. The XclImpTabViewSettings class handles importing sheet's view settings, which corresponds to worksheet/sheetViews context in the XML format.

The UNO interface code is found in sc/source/ui/uno.

Global data

Workbook-wide global data are stored in GlobalData (struct), and handled by GlobalDataHelper (class) which holds reference to the GlobalData instance. All major classes should be derived from GlobalDataHelper to ensure availability of globals in all places.

GlobalData holds reference to ImportBase instance in order to be able to create new fragments.

Different buffers hold the imported data from the fragments if it is needed later, e.g. the SharedStringsBuffer (sharedstringsbuffer.hxx) and the StylesBuffer (stylesbuffer.hxx). These buffers are always part of the GlobalDataHelper.

Handling Fragment

WorkbookFragment (class)

Handles loading of workbook.xml fragment. It loads the associated relationship file (xl/_rels/workbook.xml.rels) in the constructor.

Handling Context

In most cases the fragment handler will handle all nested contexts by itself to increase performance. For this, some helper classes have been implemented that do all needed work to deal with nested contexts (ContextHelper, FragmentBase, and ContextBase respectively in contexthelper.hxx, excelfragmentbase.hxx, and excelcontextbase.hxx). In general, for implementing a new fragment or context handler, the interface of the ContextHelper class from contexthelper.hxx has to be implemented. The classes FragmentBase (excelfragmentbase.hxx) and ContextBase (excelcontextbase.hxx) already provide default implementations of all virtual functions, but a derived class is free to implement them as well.

Relation (class)

Holds three string data for ID, Type and Target (need more info).

AddressConverter (class)

converts strings to addresses and ranges, and tracks invalid addresses (e.g. a not-importable cell at address ZZZ1000000). Later, this information will be used to generate a "Imported document contains data outside of sheet limits" warning box after loading. Header: addressconverter.hxx

UnitConverter (class)

provides basic unit conversion, including font dependent stuff such as calculating column width from a specific number of characters. Header: unitconverter.hxx

Development

Feature Developer Status Comments/Missing
Framework, fragment handling cl/dr done
Workbook, worksheet fragment, sheet names tbe/dr in progress do not insert default sheets (issue 74668)
Simple cell contents (values, strings) dr in progress error cells
Shared strings fragment dr done
Styles fragment dr done
Simple cell formatting (alignment, protection, borders, fill) dr done
Builtin number formats dr in progress missing locales (issue 29949)
Font handling for cells tbe/dr in progress asian/complex scripts (issue 74754)
Cell styles (names, formatting) dr done
Column settings (format, width, outlines) tbe/dr/kohei in progress outlines, hidden, column width from font (issue 75447)
Row settings (format, height, outlines) tbe/dr/kohei in progress outlines, hidden, row borders (issue 74667)
Rich text in cells dr done
Scheme fragment, scheme colors dr done
Cell formulas, array formulas jody/kohei
Conditional formatting jody/kohei
Link table, external sheets jody/kohei
Defined names jody/kohei
Print ranges, builtin defined names jody/kohei
Page/print settings, column/row breaks kohei
Sheet/document view settings kohei in progress
Cell hyperlinks
Label ranges
Data validation
Web queries
Pivot tables kohei
Drawing objects sj/dr
Charts dr
OLE objects, form controls sj/dr
Auto filter, user filter
Scenarios
Change tracking
Personal tools