Odf4j

From Apache OpenOffice Wiki
Revision as of 14:38, 28 March 2010 by B michaelsen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

OpenDocument for Java - odf4j

Odf4j is a (pure) Java class-library providing ODF processing capabilities for applications based on the Java platform.

ODf4j is currently in incubator status. We are exploring techniques and interfaces that yield an optimal representation of ODF documents for Java developers. Because of this, developers should not expect any of the current interfaces to be fixed. They are likely to change significantly until we make an official release.

This API has been superseeded

Please have a look at the newer project ODFDOM

Overview

The odf4j project's objective is to provide an API for reading, writing and manipulating ODF documents directly in Java applications. Odf4j implements a layered approach through which documents are accessed.

  • The Package Layer - provides direct access to the resources that are stored in the ODF package, such as XML streams, images or embedded objects.
  • The Document Layer - provides a structured interface to the actual document that is represented by the ODF package. This level is concerned with objects such as text sections, paragraphs, styles and so forth.

Odf4j is part of the odftoolkit project. Development is discussed on the dev@odftoolkit.openoffice.org mailing list and the project is available through CVS. To checkout the source code from the CVS repository use :pserver:anoncvs@anoncvs.services.openoffice.org:/cvs as CVSROOT and checkout odftoolkit/odf4j.

The Package Layer

At this level, a document is represented as a package of named resources. All resources can be accessed as streams. Resources that are XML can additionally be accessed as DOM documents. Manipulations to resources as well as new resources can be stored into the thus modified document and the thus modified document can be saved. If DOM access to XML resources is being used and the DOM tree has been modified a save operation at the Package automatically stores the modified content. It is not needed to stream something back into the Package before.

The following example illustrates how to access documents on the package level:

import org.openoffice.odf.OdfPackage;
import org.openoffice.odf.xml.OdfPackageStream;
import org.w3c.dom.Document;
import java.io.FileInputStream;
[...]

// open the document package
OdfPackage odfPackage = OpenDocumentFactory.load("testdocument1.odt");

// get content.xml as a DOM document
Document doc = odfPackage.getDocument(OdfPackage.STREAMNAME_CONTENT);

// process the DOM document
[...]

// add a picture
FileInputStream fin=new FileInputStream("/tmp/mypicture.gif");
odfPackage.store("Pictures/mypicture.gif",fin,"image/gif");

// save the processed document
odfPackage.save("tesdocument1_processed.odt");

// free up temporary resources
odfPackage.close();

The Document Layer

At this level, a document is represented as a hierarchical structure of content objects. An application can traverse the structure in order to get fine-grained access to document content objects such as paragraphs, sections, frames, hyper links etc. Users of this layer can manipulate the content objects in order to edit documents. New objects can also be created in order to extend documents or to create new documents from scratch.

The following example shows how content is accessed in document level.

import org.openoffice.odf.text.TextDocument;
import org.openoffice.odf.text.Body;
import org.openoffice.odf.text.Element;
import org.openoffice.odf.text.Paragraph;
import org.openoffice.odf.text.Portion;
[...]

TextDocument td = (TextDocument) OpenDocumentFactory.load("test/testdocument1.odt");
System.out.println("top level body structure:");
Body body = td.getBody();
// iterate over the body contents
for (Element element : body) {
    // handle sections
    if (element instanceof Section) {
        Section section = (Section) element;
        for (Element se : section) {
            // handle paragraphs
            if (se instanceof Paragraph) {
	        Paragraph paragraph = (Paragraph)se;
                for (Element pe : paragraph) {
                    // handle portions
                    if (pe instanceof Portion) {
                        Portion portion = (Portion)pe;
                        System.out.println("    portion: {" + portion.toString() +"}");
                    }
                }
            }
        }
    }
}

The next example shows how content is created on the document level.

import org.openoffice.odf.text.TextDocument;
import org.openoffice.odf.text.Body;
import org.openoffice.odf.text.Element;
import org.openoffice.odf.text.Paragraph;
import org.openoffice.odf.text.Portion;
import org.openoffice.odf.text.List;
import org.openoffice.odf.text.ListItem;
[...]

// create a new text document
TextDocument textdoc = new TextDocument();
Body body = textdoc.getBody();

// add a paragraph
body.add(new Paragraph(textdoc, "Hello World!"));

// create another paragraph
Paragraph p2 = new Paragraph(textdoc);
// insert text at the beginning of the paragraph
p2.addText("Bar!");    
// add the paragraph to the end of the document        
body.add(p2);
// insert text at the beginning of the second paragraph
p2.insertText(0, "Foo, ");
           
//create a list with list items
List l1 = new List(textdoc,
    new ListItem(textdoc, "list item 1"),
    new ListItem(textdoc, "list item 2"),
    new ListItem(textdoc, "list item 3"));

// append the list to the end of the document           
body.add(l1);

// save the new document to a file
textdoc.save("new_document.odt");

The fact that the current API needs a reference to the document in the constructor for new content objects is quite annoying and will hopefully go away as the interface evolves.

Furthermore, interfaces for handling styled text are still under development as well as those for further content objects such as tables, images and so forth.

Current and Future Work

The interface is neither final nor complete at this time. More types of content will be supported as we explore the most convenient ways of interacting with the documents from a developer's point of view. The interface should make the reading and manipulating of ODF documents in Java as painless as possible.

Personal tools