Difference between revisions of "SpreadsheetML"
m (reogranized structure a bit.) |
((checkpoint save)) |
||
(76 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category:Office Open XML]] | + | [[Category:Office Open XML]] [[Category:Calc]] |
SpreadsheetML is the XML format used by Microsoft Excel 2007 and that is part of the [[Office Open XML]] specification. | SpreadsheetML is the XML format used by Microsoft Excel 2007 and that is part of the [[Office Open XML]] specification. | ||
Line 14: | Line 14: | ||
= Sample Files = | = Sample Files = | ||
− | + | Convenient locations to download sample files are here | |
+ | * [http://svn.gnome.org/viewcvs/gnumeric/trunk/samples/excel12/ Gnumeric repository] | ||
+ | * [http://sc.openoffice.org/testdocs/index.html OpenOffice.org repository] | ||
+ | * [http://svn.gnome.org/viewcvs/ooo-build/trunk/test/excel/ ooo-build repository] | ||
+ | |||
+ | = Implementation Strategy = | ||
+ | Eventually, the code handling SpreadsheetML import will merge with the existing Excel binary filter. As such, the new code needs to be designed with this in mind. It's always desirable to understand how the existing binary filter works when implementing the XML filter to make the future merging work less painful. | ||
= Code Organization = | = Code Organization = | ||
Line 25: | Line 31: | ||
The term '''workbook''' in this context refers to an entire document which includes worksheets and other document metadata, whereas the term '''worksheet''' refers to each individual sheet in the workbook. | The term '''workbook''' in this context refers to an entire document which includes worksheets and other document metadata, whereas the term '''worksheet''' refers to each individual sheet in the workbook. | ||
− | + | The existing binary Excel filter is located in '''sc/source/filter/(inc|excel)/xi(page|view).(c|h)xx'''. The '''XclImpTabViewSettings''' class handles importing sheet's view settings, which corresponds to worksheet/sheetViews context in the XML format. | |
− | + | ||
+ | The UNO interface code is found in '''sc/source/ui/uno'''. | ||
− | |||
− | |||
== Handling Fragment == | == Handling Fragment == | ||
Line 47: | Line 52: | ||
== UnitConverter (class) == | == UnitConverter (class) == | ||
provides basic unit conversion, including font dependent stuff such as calculating column width from a specific number of characters. Header: unitconverter.hxx | provides basic unit conversion, including font dependent stuff such as calculating column width from a specific number of characters. Header: unitconverter.hxx | ||
+ | |||
+ | = Development = | ||
+ | {|border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" | ||
+ | |- | ||
+ | ! Feature !! Developer !! OOo 3.0 !! OOo 3.2 !! OOo 3.3 !! OOo 3.4 !! Comments/Missing | ||
+ | |- | ||
+ | | Framework, fragment handling || cl/dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Password, decryption || cmc/dr || || style="background:lime" colspan="3" | yes || For all filters: Word, Excel, Powerpoint | ||
+ | |- | ||
+ | | Workbook, worksheet fragment, sheet names || tbe/dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Simple cell contents (values, strings) || dr || style="background:lime" colspan="4" | yes || missing Calc feature: error cells | ||
+ | |- | ||
+ | | Shared strings || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Simple cell formatting (alignment, protection, borders, fill) || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Builtin number formats || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Font handling for cells || tbe/dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Cell styles (names, formatting) || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Column settings (format, width, outlines) || tbe/dr/kohei || style="background:lime" colspan="4" | yes || missing Calc feature: outline symbol position | ||
+ | |- | ||
+ | | Row settings (format, height, outlines) || tbe/dr/kohei || style="background:lime" colspan="4" | yes || missing Calc feature: outline symbol position | ||
+ | |- | ||
+ | | Rich text in cells || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Scheme fragment, scheme colors || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Cell formulas, array formulas, shared formulas || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Conditional formatting || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | External references/links || dr/er || style="background:yellow" | partly || style="background:lime" colspan="3" | yes || | ||
+ | |- | ||
+ | | Print ranges, builtin defined names || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Page/print settings || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Page header/footer || kohei/dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Column/row breaks || kohei || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Sheet/document view settings || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Cell hyperlinks || dr/kohei || style="background:lime" colspan="4" | yes || because a cell hyperlink is a in-cell text field object in Calc, as opposed to a cell property in Excel, a hyperlink will not get imported when the cell is a value cell. | ||
+ | |- | ||
+ | | Label ranges [[http://www.bettersolutions.com/excel/EFC133/LE824631331.htm reference]] || colspan="5" | || MS Office 2007 has dropped support for this feature. | ||
+ | |-| | ||
+ | | Data validation || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Web queries || kohei || style="background:yellow" colspan="3" | partly || style="background:lime" | yes || | ||
+ | |- | ||
+ | | Pivot tables || dr || style="background:yellow" | partly || style="background:lime" colspan="3" | yes || The geometry of Calc's data pilot and Excel's pivot table objects are different, so some overlapping problems are expected. | ||
+ | |- | ||
+ | | Auto filter, user filter || kohei || style="background:yellow" colspan="3" | partly || style="background:lime" | yes || Some incompatible bits present, so we have to be creative in importing Excel's autofilter items. | ||
+ | |- | ||
+ | | Tables || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Table filters || dr || colspan="3" | || style="background:lime" | yes || | ||
+ | |- | ||
+ | | Scenarios || dr || || style="background:lime" colspan="3" | yes || | ||
+ | |- | ||
+ | | Drawing objects || sj/dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Charts - chart types || dr || style="background:lime" colspan="4" | yes || Supported types: bar/column 2D/3D clustered/stacked/percent, line 2D/3D standard/stacked/percent, area 2D/3D, stock (4 types), line radar, pie 2D/3D, doughnut, scatter, bubble (OOo 3.2), filled radar (OOo 3.2). Missing: surface (mapped to 3D bars). | ||
+ | |- | ||
+ | | Charts - titles/legend || dr || style="background:lime" colspan="4" | yes || | ||
+ | |- | ||
+ | | Charts - series settings || dr || style="background:lime" colspan="4" | yes || Including: sheet source links, primary/secondary axes, error bars, trendlines. | ||
+ | |- | ||
+ | | Charts - formatting || dr || style="background:lime" colspan="4" | yes || Including: manual formatting, chart styles. | ||
+ | |- | ||
+ | | Charts - titles/legend/plot area position || iha/dr || colspan="2" | || style="background:lime" colspan="2" | yes || | ||
+ | |- | ||
+ | | Charts - embedded drawing objects || iha/tbe/dr || colspan="2" | || style="background:lime" colspan="2" | yes || | ||
+ | |- | ||
+ | | OLE objects, ActiveX controls || sj/hbrinkm/dr || || style="background:lime" colspan="3" | yes || | ||
+ | |- | ||
+ | | Drawing controls || dr || colspan="3" | || style="background:lime" | yes || | ||
+ | |- | ||
+ | | Cell notes || dr || || style="background:yellow" colspan="3" | partly || Includes import of VML shape formatting usable for all filters. Missing: Rich text formatting. | ||
+ | |- | ||
+ | | Change tracking || dr || colspan="4" | || | ||
+ | |- | ||
+ | | VBA project || dr || colspan="2" | || style="background:lime" colspan="2" | yes || | ||
+ | |} | ||
+ | |||
+ | == External References == | ||
+ | * '''xl/workbook.xml''' contains '''externalReference''' tags that references relationship items in '''xl/_rels/workbook.xml.rels''' by rID. | ||
+ | * '''xl/_rels/workbook.xml.rels''' contains path(s) to external link fragments, that are usually '''xl/externalLinks/externalLink*.xml'''. | ||
+ | * '''xl/externalLinks/externalLink*.xml''' contains summary content information of the external book being referenced. It should provide enough information to display external cell's content. It also contains reference to the path to the external file by rID. | ||
+ | * '''xl/externalLinks/_rels/externalLink*.xml.rels''' contains the path to the external file, stored by rID. | ||
+ | |||
+ | Inside a cell that contains an external cell reference, the format is given | ||
+ | |||
+ | <f>[1]Sheet1!A1</f> | ||
+ | |||
+ | as formula. | ||
+ | |||
+ | == Data Validations == | ||
+ | The UNO API for data validation is [http://api.openoffice.org/docs/common/ref/com/sun/star/sheet/TableValidation.html here]. | ||
+ | |||
+ | '''ScConditionEntry''' needs new methods to allow setting pFormula1 and pFormula2 that are of type '''ScTokenArray'''. '''ScValidationData''' is an immediate child class of '''ScConditionEntry''', and it represents the data validation attribute of a cell. | ||
+ | |||
+ | '''ScTableValidationObj''' implements UNO wrapper for ScValidationData, and this class needs to implement the css::sheet::XFormulaTokens interface so that the client code can pass formula tokens to this class, and that formula token set needs to eventually find its way into ScValidationData as ScTokenArray. | ||
+ | |||
+ | == Web Queries == | ||
+ | There are 5 relevant sections to touch. | ||
+ | |||
+ | * '''/xl/worksheets/_rels/sheet?.xml.rels''' - contains the paths to query table properties fragments. | ||
+ | * '''/xl/queryTables/queryTable?.xml''' - contains query table properties, especially their name and associated connection IDs. | ||
+ | * '''/xl/_rels/workbook.xml.rels''' - contains the path to connections properties fragment. | ||
+ | * '''/xl/connections.xml''' - contains connections properties, such as URL. | ||
+ | * '''/xl/workbook.xml''' - defined names are used to associate a cell range to a query table by name. | ||
+ | |||
+ | The import strategy is to: | ||
+ | |||
+ | # Store the query table properties into query table container, with name as a key. | ||
+ | # Store the connections properties into connection container, with connection ID as a key. | ||
+ | # Parse a defined name and see if a query table for the same name exists. | ||
+ | # If yes, get the connection ID from the query table list. | ||
+ | # Get the connection properties by that ID from the connection list. | ||
+ | # Connect to the URL for data retrieval. | ||
+ | |||
+ | == Pivot Table == | ||
+ | * '''/xl/workbook.xml''' - contains the IDs of pivot table caches and their relation IDs. | ||
+ | * '''/xl/_rels/workbook.xml.rels''' - contains paths to pivot table cache fragments and their IDs. | ||
+ | * '''/xl/worksheets/_rels/sheet?.xml.rels''' - contains path to pivot table xml fragment(s). Note that the source sheet fragment does not contain the relation ID to such pivot table fragment. | ||
+ | * '''/xl/pivotTables/pivotTable?.xml''' - contains the pivot table definitions. | ||
+ | * '''/xl/pivotTables/_rels/pivotTable?.xml.rels''' - contains the path to pivot table cache. | ||
+ | * '''/xl/pivotCache/pivotCacheDefinition?.xml''' - pivot cache definitions. | ||
+ | * '''/xl/pivotCache/_rels/pivotCacheDefinition?.xml.rels''' - contains the path to pivot cache records. | ||
+ | * '''/xl/pirovtCache/pivotCacheRecords?.xml''' - pivot cache records. | ||
+ | |||
+ | === Related === | ||
+ | |||
+ | * [[How to import Pivot Table in [[oox]] module]] | ||
+ | |||
+ | =See Also= | ||
+ | * [[sc]] module page for the implementation of [[Calc]] core. |
Latest revision as of 12:07, 20 July 2012
SpreadsheetML is the XML format used by Microsoft Excel 2007 and that is part of the Office Open XML specification.
SpreadsheetML Basics
workbook
A SpreadsheetML document is described at the top level by a workbook part ( /xl/workbook.xml ). The type of the work book type is
http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument
The workbook part contains document's metadata and one or more sheets. Each sheet can be a worksheet, a chart sheet, or a dialog sheet.
Both strings and formulas are stored in shared tables to avoid redundant storage and speed file I/O's.
Sample Files
Convenient locations to download sample files are here
Implementation Strategy
Eventually, the code handling SpreadsheetML import will merge with the existing Excel binary filter. As such, the new code needs to be designed with this in mind. It's always desirable to understand how the existing binary filter works when implementing the XML filter to make the future merging work less painful.
Code Organization
Source files for handling the SpreadsheetML format are located in inc/oox/xls and source/xls under module oox. A good place to start tracing the code would be ExcelFilter::Import and follow the calls it makes.
A substream in the XML package is called "fragment", and each fragment has an associated *fragment.hxx header file. For instance, the code for loading of the workbook.xml fragment is found in workbookfragment.hxx, and so on.
A nested element is called "context", and, like the fragments, each context has an associated *context.hxx. For instance, the code for parsing the <sheetData> element is found in sheetdatacontext.hxx.
The term workbook in this context refers to an entire document which includes worksheets and other document metadata, whereas the term worksheet refers to each individual sheet in the workbook.
The existing binary Excel filter is located in sc/source/filter/(inc|excel)/xi(page|view).(c|h)xx. The XclImpTabViewSettings class handles importing sheet's view settings, which corresponds to worksheet/sheetViews context in the XML format.
The UNO interface code is found in sc/source/ui/uno.
Handling Fragment
WorkbookFragment (class)
Handles loading of workbook.xml fragment. It loads the associated relationship file (xl/_rels/workbook.xml.rels) in the constructor.
Handling Context
In most cases the fragment handler will handle all nested contexts by itself to increase performance. For this, some helper classes have been implemented that do all needed work to deal with nested contexts (ContextHelper, FragmentBase, and ContextBase respectively in contexthelper.hxx, excelfragmentbase.hxx, and excelcontextbase.hxx). In general, for implementing a new fragment or context handler, the interface of the ContextHelper class from contexthelper.hxx has to be implemented. The classes FragmentBase (excelfragmentbase.hxx) and ContextBase (excelcontextbase.hxx) already provide default implementations of all virtual functions, but a derived class is free to implement them as well.
Relation (class)
Holds three string data for ID, Type and Target (need more info).
AddressConverter (class)
converts strings to addresses and ranges, and tracks invalid addresses (e.g. a not-importable cell at address ZZZ1000000). Later, this information will be used to generate a "Imported document contains data outside of sheet limits" warning box after loading. Header: addressconverter.hxx
UnitConverter (class)
provides basic unit conversion, including font dependent stuff such as calculating column width from a specific number of characters. Header: unitconverter.hxx
Development
Feature | Developer | OOo 3.0 | OOo 3.2 | OOo 3.3 | OOo 3.4 | Comments/Missing |
---|---|---|---|---|---|---|
Framework, fragment handling | cl/dr | yes | ||||
Password, decryption | cmc/dr | yes | For all filters: Word, Excel, Powerpoint | |||
Workbook, worksheet fragment, sheet names | tbe/dr | yes | ||||
Simple cell contents (values, strings) | dr | yes | missing Calc feature: error cells | |||
Shared strings | dr | yes | ||||
Simple cell formatting (alignment, protection, borders, fill) | dr | yes | ||||
Builtin number formats | dr | yes | ||||
Font handling for cells | tbe/dr | yes | ||||
Cell styles (names, formatting) | dr | yes | ||||
Column settings (format, width, outlines) | tbe/dr/kohei | yes | missing Calc feature: outline symbol position | |||
Row settings (format, height, outlines) | tbe/dr/kohei | yes | missing Calc feature: outline symbol position | |||
Rich text in cells | dr | yes | ||||
Scheme fragment, scheme colors | dr | yes | ||||
Cell formulas, array formulas, shared formulas | dr | yes | ||||
Conditional formatting | dr | yes | ||||
External references/links | dr/er | partly | yes | |||
Print ranges, builtin defined names | dr | yes | ||||
Page/print settings | dr | yes | ||||
Page header/footer | kohei/dr | yes | ||||
Column/row breaks | kohei | yes | ||||
Sheet/document view settings | dr | yes | ||||
Cell hyperlinks | dr/kohei | yes | because a cell hyperlink is a in-cell text field object in Calc, as opposed to a cell property in Excel, a hyperlink will not get imported when the cell is a value cell. | |||
Label ranges [reference] | MS Office 2007 has dropped support for this feature. | |||||
Data validation | dr | yes | ||||
Web queries | kohei | partly | yes | |||
Pivot tables | dr | partly | yes | The geometry of Calc's data pilot and Excel's pivot table objects are different, so some overlapping problems are expected. | ||
Auto filter, user filter | kohei | partly | yes | Some incompatible bits present, so we have to be creative in importing Excel's autofilter items. | ||
Tables | dr | yes | ||||
Table filters | dr | yes | ||||
Scenarios | dr | yes | ||||
Drawing objects | sj/dr | yes | ||||
Charts - chart types | dr | yes | Supported types: bar/column 2D/3D clustered/stacked/percent, line 2D/3D standard/stacked/percent, area 2D/3D, stock (4 types), line radar, pie 2D/3D, doughnut, scatter, bubble (OOo 3.2), filled radar (OOo 3.2). Missing: surface (mapped to 3D bars). | |||
Charts - titles/legend | dr | yes | ||||
Charts - series settings | dr | yes | Including: sheet source links, primary/secondary axes, error bars, trendlines. | |||
Charts - formatting | dr | yes | Including: manual formatting, chart styles. | |||
Charts - titles/legend/plot area position | iha/dr | yes | ||||
Charts - embedded drawing objects | iha/tbe/dr | yes | ||||
OLE objects, ActiveX controls | sj/hbrinkm/dr | yes | ||||
Drawing controls | dr | yes | ||||
Cell notes | dr | partly | Includes import of VML shape formatting usable for all filters. Missing: Rich text formatting. | |||
Change tracking | dr | |||||
VBA project | dr | yes |
External References
- xl/workbook.xml contains externalReference tags that references relationship items in xl/_rels/workbook.xml.rels by rID.
- xl/_rels/workbook.xml.rels contains path(s) to external link fragments, that are usually xl/externalLinks/externalLink*.xml.
- xl/externalLinks/externalLink*.xml contains summary content information of the external book being referenced. It should provide enough information to display external cell's content. It also contains reference to the path to the external file by rID.
- xl/externalLinks/_rels/externalLink*.xml.rels contains the path to the external file, stored by rID.
Inside a cell that contains an external cell reference, the format is given
<f>[1]Sheet1!A1</f>
as formula.
Data Validations
The UNO API for data validation is here.
ScConditionEntry needs new methods to allow setting pFormula1 and pFormula2 that are of type ScTokenArray. ScValidationData is an immediate child class of ScConditionEntry, and it represents the data validation attribute of a cell.
ScTableValidationObj implements UNO wrapper for ScValidationData, and this class needs to implement the css::sheet::XFormulaTokens interface so that the client code can pass formula tokens to this class, and that formula token set needs to eventually find its way into ScValidationData as ScTokenArray.
Web Queries
There are 5 relevant sections to touch.
- /xl/worksheets/_rels/sheet?.xml.rels - contains the paths to query table properties fragments.
- /xl/queryTables/queryTable?.xml - contains query table properties, especially their name and associated connection IDs.
- /xl/_rels/workbook.xml.rels - contains the path to connections properties fragment.
- /xl/connections.xml - contains connections properties, such as URL.
- /xl/workbook.xml - defined names are used to associate a cell range to a query table by name.
The import strategy is to:
- Store the query table properties into query table container, with name as a key.
- Store the connections properties into connection container, with connection ID as a key.
- Parse a defined name and see if a query table for the same name exists.
- If yes, get the connection ID from the query table list.
- Get the connection properties by that ID from the connection list.
- Connect to the URL for data retrieval.
Pivot Table
- /xl/workbook.xml - contains the IDs of pivot table caches and their relation IDs.
- /xl/_rels/workbook.xml.rels - contains paths to pivot table cache fragments and their IDs.
- /xl/worksheets/_rels/sheet?.xml.rels - contains path to pivot table xml fragment(s). Note that the source sheet fragment does not contain the relation ID to such pivot table fragment.
- /xl/pivotTables/pivotTable?.xml - contains the pivot table definitions.
- /xl/pivotTables/_rels/pivotTable?.xml.rels - contains the path to pivot table cache.
- /xl/pivotCache/pivotCacheDefinition?.xml - pivot cache definitions.
- /xl/pivotCache/_rels/pivotCacheDefinition?.xml.rels - contains the path to pivot cache records.
- /xl/pirovtCache/pivotCacheRecords?.xml - pivot cache records.
Related
- [[How to import Pivot Table in oox module]]