Difference between revisions of "SpreadsheetML"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Pivot Table)
(Development)
Line 67: Line 67:
 
| Framework, fragment handling || cl/dr || done ||
 
| Framework, fragment handling || cl/dr || done ||
 
|-
 
|-
| Workbook, worksheet fragment, sheet names || tbe/dr || in progress || do not insert default sheets ([http://www.openoffice.org/issues/show_bug.cgi?id=74668 issue 74668])
+
| Workbook, worksheet fragment, sheet names || tbe/dr || done ||  
 
|-
 
|-
 
| Simple cell contents (values, strings) || dr || in progress || error cells
 
| Simple cell contents (values, strings) || dr || in progress || error cells
Line 77: Line 77:
 
| Simple cell formatting (alignment, protection, borders, fill) || dr || done ||
 
| Simple cell formatting (alignment, protection, borders, fill) || dr || done ||
 
|-
 
|-
| Builtin number formats || dr || in progress || missing locales ([http://www.openoffice.org/issues/show_bug.cgi?id=29949 issue 29949])
+
| Builtin number formats || dr || done ||  
 
|-
 
|-
| Font handling for cells || tbe/dr || in progress || asian/complex scripts ([http://www.openoffice.org/issues/show_bug.cgi?id=74754 issue 74754])
+
| Font handling for cells || tbe/dr || done ||  
 
|-
 
|-
 
| Cell styles (names, formatting) || dr || done ||
 
| Cell styles (names, formatting) || dr || done ||
 
|-
 
|-
| Column settings (format, width, outlines) || tbe/dr/kohei || in progress || outlines, hidden, column width from font ([http://www.openoffice.org/issues/show_bug.cgi?id=75447 issue 75447])
+
| Column settings (format, width, outlines) || tbe/dr/kohei || in progress || outlines
 
|-
 
|-
| Row settings (format, height, outlines) || tbe/dr/kohei || in progress || outlines, hidden, row borders ([http://www.openoffice.org/issues/show_bug.cgi?id=74667 issue 74667])
+
| Row settings (format, height, outlines) || tbe/dr/kohei || in progress || outlines
 
|-
 
|-
 
| Rich text in cells || dr || done ||
 
| Rich text in cells || dr || done ||
Line 91: Line 91:
 
| Scheme fragment, scheme colors || dr || done ||
 
| Scheme fragment, scheme colors || dr || done ||
 
|-
 
|-
| Cell formulas, array formulas || jody || in progress || almost ready to be committed.
+
| Cell formulas, array formulas || dr || in progress || function name mapping
 
|-
 
|-
| Conditional formatting || jody/kohei || formula parser || blocks on formula parser.
+
| Conditional formatting || kohei || formula parser || blocks on formula parser.
 
|-
 
|-
 
| External references/links || kohei || formula parser || parsing of externalLink fragments complete.  The external link information is stored in '''::oox::xls::ExtRefBuffer'''.
 
| External references/links || kohei || formula parser || parsing of externalLink fragments complete.  The external link information is stored in '''::oox::xls::ExtRefBuffer'''.
 
|-
 
|-
| Print ranges, builtin defined names || jody/kohei || on hold || blocks on formula parser implementation because a defined name can be any cell formula.
+
| Print ranges, builtin defined names || dr || in progress ||  
 
|-
 
|-
| Page/print settings || kohei || done || waiting on header/footer task to be completed.
+
| Page/print settings || dr/kohei || done || waiting on header/footer task to be completed.
 
|-
 
|-
 
| Header/footer || kohei/dr || done ||
 
| Header/footer || kohei/dr || done ||
Line 105: Line 105:
 
| Column/row breaks || kohei || done ||
 
| Column/row breaks || kohei || done ||
 
|-
 
|-
| Sheet/document view settings || dr || in progress || not possible to set view settings during import; probably needs to create a buffer class to store imported view settings, and apply them after the import.
+
| Sheet/document view settings || dr || done ||  
 
|-
 
|-
 
| Cell hyperlinks || kohei || done || because a cell hyperlink is a in-cell text field object in Calc, as opposed to a cell property in Excel, a hyperlink will not get imported when the cell is a value cell.
 
| Cell hyperlinks || kohei || done || because a cell hyperlink is a in-cell text field object in Calc, as opposed to a cell property in Excel, a hyperlink will not get imported when the cell is a value cell.
Line 111: Line 111:
 
| Label ranges [[http://www.bettersolutions.com/excel/EFC133/LE824631331.htm reference]] || || skipped || MS Office 2007 seems to have dropped support for this feature.
 
| Label ranges [[http://www.bettersolutions.com/excel/EFC133/LE824631331.htm reference]] || || skipped || MS Office 2007 seems to have dropped support for this feature.
 
|-
 
|-
| Data validation || kohei || formula parser || requires formula parser.
+
| Data validation || kohei || in progress ||  
 
|-
 
|-
 
| Web queries || kohei || formula parser || needs defined names, which requires formula parser.
 
| Web queries || kohei || formula parser || needs defined names, which requires formula parser.
Line 141: Line 141:
  
 
as formula.
 
as formula.
 
== Page Settings ==
 
Refer to class '''XclPageData''' which is the data structure used by the existing binary filter.  Here is the [http://api.openoffice.org/docs/common/ref/com/sun/star/sheet/TablePageStyle.html page] for Calc's sheet properties.
 
  
 
== Data Validations ==
 
== Data Validations ==

Revision as of 15:21, 3 July 2007


SpreadsheetML is the XML format used by Microsoft Excel 2007 and that is part of the Office Open XML specification.

SpreadsheetML Basics

workbook

A SpreadsheetML document is described at the top level by a workbook part ( /xl/workbook.xml ). The type of the work book type is

 http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument

The workbook part contains document's metadata and one or more sheets. Each sheet can be a worksheet, a chart sheet, or a dialog sheet.

Both strings and formulas are stored in shared tables to avoid redundant storage and speed file I/O's.

Sample Files

Convenient locations to download sample files are here

Implementation Strategy

Eventually, the code handling SpreadsheetML import will merge with the existing Excel binary filter. As such, the new code needs to be designed with this in mind. It's always desirable to understand how the existing binary filter works when implementing the XML filter to make the future merging work less painful.

Code Organization

Source files for handling the SpreadsheetML format are located in inc/oox/xls and source/xls under module oox. A good place to start tracing the code would be ExcelFilter::Import and follow the calls it makes.

A substream in the XML package is called "fragment", and each fragment has an associated *fragment.hxx header file. For instance, the code for loading of the workbook.xml fragment is found in workbookfragment.hxx, and so on.

A nested element is called "context", and, like the fragments, each context has an associated *context.hxx. For instance, the code for parsing the <sheetData> element is found in sheetdatacontext.hxx.

The term workbook in this context refers to an entire document which includes worksheets and other document metadata, whereas the term worksheet refers to each individual sheet in the workbook.

The existing binary Excel filter is located in sc/source/filter/(inc|excel)/xi(page|view).(c|h)xx. The XclImpTabViewSettings class handles importing sheet's view settings, which corresponds to worksheet/sheetViews context in the XML format.

The UNO interface code is found in sc/source/ui/uno.

Global data

Workbook-wide global data are stored in GlobalData (struct), and handled by GlobalDataHelper (class) which holds reference to the GlobalData instance. All major classes should be derived from GlobalDataHelper to ensure availability of globals in all places.

GlobalData holds reference to ImportBase instance in order to be able to create new fragment handlers.

Different buffers hold the imported data from the fragments if it is needed later, e.g. the SharedStringsBuffer (sharedstringsbuffer.hxx) and the StylesBuffer (stylesbuffer.hxx). These buffers are always part of the GlobalDataHelper.

GlobalDataHelper (class)

It's important to be familar with the member methods of this class, because they are also available in most of XLSX import classes.

Handling Fragment

WorkbookFragment (class)

Handles loading of workbook.xml fragment. It loads the associated relationship file (xl/_rels/workbook.xml.rels) in the constructor.

Handling Context

In most cases the fragment handler will handle all nested contexts by itself to increase performance. For this, some helper classes have been implemented that do all needed work to deal with nested contexts (ContextHelper, FragmentBase, and ContextBase respectively in contexthelper.hxx, excelfragmentbase.hxx, and excelcontextbase.hxx). In general, for implementing a new fragment or context handler, the interface of the ContextHelper class from contexthelper.hxx has to be implemented. The classes FragmentBase (excelfragmentbase.hxx) and ContextBase (excelcontextbase.hxx) already provide default implementations of all virtual functions, but a derived class is free to implement them as well.

Relation (class)

Holds three string data for ID, Type and Target (need more info).

AddressConverter (class)

converts strings to addresses and ranges, and tracks invalid addresses (e.g. a not-importable cell at address ZZZ1000000). Later, this information will be used to generate a "Imported document contains data outside of sheet limits" warning box after loading. Header: addressconverter.hxx

UnitConverter (class)

provides basic unit conversion, including font dependent stuff such as calculating column width from a specific number of characters. Header: unitconverter.hxx

Development

Feature Developer Status Comments/Missing
Framework, fragment handling cl/dr done
Workbook, worksheet fragment, sheet names tbe/dr done
Simple cell contents (values, strings) dr in progress error cells
Shared strings fragment dr done
Styles fragment dr done
Simple cell formatting (alignment, protection, borders, fill) dr done
Builtin number formats dr done
Font handling for cells tbe/dr done
Cell styles (names, formatting) dr done
Column settings (format, width, outlines) tbe/dr/kohei in progress outlines
Row settings (format, height, outlines) tbe/dr/kohei in progress outlines
Rich text in cells dr done
Scheme fragment, scheme colors dr done
Cell formulas, array formulas dr in progress function name mapping
Conditional formatting kohei formula parser blocks on formula parser.
External references/links kohei formula parser parsing of externalLink fragments complete. The external link information is stored in ::oox::xls::ExtRefBuffer.
Print ranges, builtin defined names dr in progress
Page/print settings dr/kohei done waiting on header/footer task to be completed.
Header/footer kohei/dr done
Column/row breaks kohei done
Sheet/document view settings dr done
Cell hyperlinks kohei done because a cell hyperlink is a in-cell text field object in Calc, as opposed to a cell property in Excel, a hyperlink will not get imported when the cell is a value cell.
Label ranges [reference] skipped MS Office 2007 seems to have dropped support for this feature.
Data validation kohei in progress
Web queries kohei formula parser needs defined names, which requires formula parser.
Pivot tables kohei in progress
Drawing objects sj/dr
Charts dr
OLE objects, form controls sj/dr
Auto filter, user filter kohei done some incompatible bits present, so we have to be creative in importing Excel's autofilter items.
Scenarios
Change tracking

External References

  • xl/workbook.xml contains externalReference tags that references relationship items in xl/_rels/workbook.xml.rels by rID.
  • xl/_rels/workbook.xml.rels contains path(s) to external link fragments, that are usually xl/externalLinks/externalLink*.xml.
  • xl/externalLinks/externalLink*.xml contains summary content information of the external book being referenced. It should provide enough information to display external cell's content. It also contains reference to the path to the external file by rID.
  • xl/externalLinks/_rels/externalLink*.xml.rels contains the path to the external file, stored by rID.

Inside a cell that contains an external cell reference, the format is given

 <f>[1]Sheet1!A1</f>

as formula.

Data Validations

The UNO API for data validation is here.

Web Queries

There are 5 relevant sections to touch.

  • /xl/worksheets/_rels/sheet?.xml.rels - contains the paths to query table properties fragments.
  • /xl/queryTables/queryTable?.xml - contains query table properties, especially their name and associated connection IDs.
  • /xl/_rels/workbook.xml.rels - contains the path to connections properties fragment.
  • /xl/connections.xml - contains connections properties, such as URL.
  • /xl/workbook.xml - defined names are used to associate a cell range to a query table by name.

The import strategy is to:

  1. Store the query table properties into query table container, with name as a key.
  2. Store the connections properties into connection container, with connection ID as a key.
  3. Parse a defined name and see if a query table for the same name exists.
  4. If yes, get the connection ID from the query table list.
  5. Get the connection properties by that ID from the connection list.
  6. Connect to the URL for data retrieval.

Pivot Table

  • /xl/workbook.xml - contains the IDs of pivot table caches and their relation IDs.
  • /xl/_rels/workbook.xml.rels - contains paths to pivot table cache fragments and their IDs.
  • /xl/worksheets/_rels/sheet?.xml.rels - contains path to pivot table xml fragment(s). Note that the source sheet fragment does not contain the relation ID to such pivot table fragment.
  • /xl/pivotTables/pivotTable?.xml - contains the pivot table definitions.
  • /xl/pivotTables/_rels/pivotTable?.xml.rels - contains the path to pivot table cache.
  • /xl/pivotCache/pivotCacheDefinition?.xml - pivot cache definitions.
  • /xl/pivotCache/_rels/pivotCacheDefinition?.xml.rels - contains the path to pivot cache records.
  • /xl/pirovtCache/pivotCacheRecords?.xml - pivot cache records.

See Also

  • sc module page for the implementation of Calc core.
Personal tools