Difference between revisions of "OOXML/WordProcessingML"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Initial version)
 
m
Line 9: Line 9:
 
* [[OOXML/WordProcessingML/Styles|Styles]]
 
* [[OOXML/WordProcessingML/Styles|Styles]]
 
* [[OOXML/WordProcessingML/Formatting|Formatting]]
 
* [[OOXML/WordProcessingML/Formatting|Formatting]]
 +
* [[OOXML/WordProcessingML/Lists_and_Numbering|Lists and Numbering]]
  
 
== Basic structure ==
 
== Basic structure ==

Revision as of 14:04, 31 July 2014

WordProcessingML is the schema used for representing text documents in OOXML. Every .docx file contains a single WordProcessingML document, which plays the same role as content.xml in ODF.

Microsoft Word normally saves the file as word/document.xml, however you should not rely this always being the filename - instead, the filename should be determined based on the OPC relationship structure.

All elements and attributes in WordProcessingML are in the following namespace, which the examples here associate with the w prefix:

http://schemas.openxmlformats.org/wordprocessingml/2006/main

Basic structure

The three most important elements in WordProcessingML are paragraphs, runs, and tables.

A run is a piece of text or other inline content (such as an image) which has a particular set of formatting properties associated with it. Unlike in HTML and ODF, runs cannot be nested - paragraphs have a "flat" structure. If different formatting is applied to different parts of the paragraph, then each will be placed in a separate run.

Tables are represented in a row-primary format in a very similar manner to HTML. A table element contains one or more row elements, and a row element contains one or more cell elements. Each cell element contains block-level content, such as paragraphs or other tables.

Paragraphs are represented using the <w:p> elements, runs using <w:r> elements, and tables using <w:table>, <w:tr>, and <w:tc> elements. At the outer level, all of the content resides inside a nested pair of <w:document> and <w:body> elements.

Here is the simplest possible WordProcessingML document:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Hello World</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

Formatting and other information about particular elements is stored in "property" elements, which contain a series of child elements, one per property. These are mostly analogous to attributes and CSS properties in HTML, though in some cases a child element may have multiple properties. Paragraphs, runs, and tables may each begin with a <w:pPr>, <w:rPr>, or <w:tblPr> element, respectively.

Here is a document with a single center-aligned paragraph, and bold text:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:pPr>
        <w:jc w:val="center"/>
        <w:rPr>
          <w:u w:val="single"/>
        </w:rPr>
      </w:pPr>
      <w:r>
        <w:rPr>
          <w:u w:val="single"/>
        </w:rPr>
        <w:t>Hello World</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

There, the children of the <w:pPr> and <w:rPr> elements both have a value attribute, as these properties require parameters to be specified (the type of alignment, and the style of underline). Some property elements have additional attributes, and some require none.

Note that there are two <w:rPr> elements - one within the <w:pPr> element, and another within the run itself. The first isn't strictly necessary for this case; it applies not to the text within the paragraph, but to the "paragraph marker" at the end. Apparently, this is supposed to influence appearance of the paragraph marker (¶ symbol); Word 2011 and 2013 both add this element but then ignore it. The first <w:rPr> is unnecessary in practice; only the latter affects the text.

Personal tools