Office Open XML/Legacy Implementation

From Apache OpenOffice Wiki
Jump to: navigation, search

"Office Open XML" is an XML based file format that has been published as ECMA-376. It is used as default file format by Microsoft Office 2007.

There are plans to support this file format in OpenOffice.org for interoperate with Microsoft Office 2007.

There are 3 major types of formats

  • WordprocessingML - For word processor documents (file extensions may be docx, docm)
  • SpreadsheetML - For spreadsheet documents (file extensions may be xlsx, xlsm)
  • PresentationML - For presentation documents (file extensions may be pptx, pptm)
  • DrawingML - Used by other markup language to represent graphics data.

OOXML Basics

OpenXML document is a package that consists of a flat collection of "parts". Each "part" has a case-insensitive part name that consists of a slash (/) delimited sequence of segment names such as "/pres/slides/slide1.xml".

For the most part, the ZIP compression is used to package the parts, in which case the package refers to the ZIP implementation, and the parts refer to the individual files archived within. The part name in such case is the file path within the archive.

Each part also has a content type, and /[Content_Types].xml provides the content type of each part within the archive.

OO.o Implementation

There is some code in the oox module (OOX) from the Xml project. The CWS is xmlfilter02 in SRC680. (view the workspace on EIS)

To fetch the oox code from CVS (using CVS_ROOT is set properly):

cvs co -r cws_src680_xmlfilter02 -d oox xml/oox

Implementation Generalities

The whole OOX filter makes use of the new FastParser service to implement an event driven SAX parser.

Various Resources

Personal tools