Odf4j
OpenDocument for Java - odf4j
Odf4j is a (pure) Java class-library providing ODF processing capabilities for applications based on the Java platform.
ODf4j is currently in incubator status. We are exploring techniques and interfaces that yield an optimal representation of ODF documents for Java developers. Because of this, developers should not expect any of the current interfaces to be fixed. They are likely to change significantly until we make an official release.
This API has been superseeded
Please have a look at the newer project ODFDOM
Overview
The odf4j project's objective is to provide an API for reading, writing and manipulating ODF documents directly in Java applications. Odf4j implements a layered approach through which documents are accessed.
- The Package Layer - provides direct access to the resources that are stored in the ODF package, such as XML streams, images or embedded objects.
- The Document Layer - provides a structured interface to the actual document that is represented by the ODF package. This level is concerned with objects such as text sections, paragraphs, styles and so forth.
Odf4j is part of the odftoolkit project. Development is discussed on the dev@odftoolkit.openoffice.org mailing list and the project is available through CVS. To checkout the source code from the CVS repository use :pserver:anoncvs@anoncvs.services.openoffice.org:/cvs as CVSROOT and checkout odftoolkit/odf4j.
The Package Layer
At this level, a document is represented as a package of named resources. All resources can be accessed as streams. Resources that are XML can additionally be accessed as DOM documents. Manipulations to resources as well as new resources can be stored into the thus modified document and the thus modified document can be saved. If DOM access to XML resources is being used and the DOM tree has been modified a save operation at the Package automatically stores the modified content. It is not needed to stream something back into the Package before.
The following example illustrates how to access documents on the package level:
import org.openoffice.odf.OdfPackage; import org.openoffice.odf.xml.OdfPackageStream; import org.w3c.dom.Document; import java.io.FileInputStream; [...] // open the document package OdfPackage odfPackage = OpenDocumentFactory.load("testdocument1.odt"); // get content.xml as a DOM document Document doc = odfPackage.getDocument(OdfPackage.STREAMNAME_CONTENT); // process the DOM document [...] // add a picture FileInputStream fin=new FileInputStream("/tmp/mypicture.gif"); odfPackage.store("Pictures/mypicture.gif",fin,"image/gif"); // save the processed document odfPackage.save("tesdocument1_processed.odt"); // free up temporary resources odfPackage.close();
The Document Layer
At this level, a document is represented as a hierarchical structure of content objects. An application can traverse the structure in order to get fine-grained access to document content objects such as paragraphs, sections, frames, hyper links etc. Users of this layer can manipulate the content objects in order to edit documents. New objects can also be created in order to extend documents or to create new documents from scratch.
The following example shows how content is accessed in document level.
import org.openoffice.odf.text.TextDocument; import org.openoffice.odf.text.Body; import org.openoffice.odf.text.Element; import org.openoffice.odf.text.Paragraph; import org.openoffice.odf.text.Portion; [...] TextDocument td = (TextDocument) OpenDocumentFactory.load("test/testdocument1.odt"); System.out.println("top level body structure:"); Body body = td.getBody(); // iterate over the body contents for (Element element : body) { // handle sections if (element instanceof Section) { Section section = (Section) element; for (Element se : section) { // handle paragraphs if (se instanceof Paragraph) { Paragraph paragraph = (Paragraph)se; for (Element pe : paragraph) { // handle portions if (pe instanceof Portion) { Portion portion = (Portion)pe; System.out.println(" portion: {" + portion.toString() +"}"); } } } } } }
The next example shows how content is created on the document level.
import org.openoffice.odf.text.TextDocument; import org.openoffice.odf.text.Body; import org.openoffice.odf.text.Element; import org.openoffice.odf.text.Paragraph; import org.openoffice.odf.text.Portion; import org.openoffice.odf.text.List; import org.openoffice.odf.text.ListItem; [...] // create a new text document TextDocument textdoc = new TextDocument(); Body body = textdoc.getBody(); // add a paragraph body.add(new Paragraph(textdoc, "Hello World!")); // create another paragraph Paragraph p2 = new Paragraph(textdoc); // insert text at the beginning of the paragraph p2.addText("Bar!"); // add the paragraph to the end of the document body.add(p2); // insert text at the beginning of the second paragraph p2.insertText(0, "Foo, "); //create a list with list items List l1 = new List(textdoc, new ListItem(textdoc, "list item 1"), new ListItem(textdoc, "list item 2"), new ListItem(textdoc, "list item 3")); // append the list to the end of the document body.add(l1); // save the new document to a file textdoc.save("new_document.odt");
The fact that the current API needs a reference to the document in the constructor for new content objects is quite annoying and will hopefully go away as the interface evolves.
Furthermore, interfaces for handling styled text are still under development as well as those for further content objects such as tables, images and so forth.
Current and Future Work
The interface is neither final nor complete at this time. More types of content will be supported as we explore the most convenient ways of interacting with the documents from a developer's point of view. The interface should make the reading and manipulating of ODF documents in Java as painless as possible.