Difference between revisions of "Documentation/DevGuide/OfficeDev/RDF metadata"

From Apache OpenOffice Wiki
Jump to: navigation, search
(add note about document URI)
(add new elements with xml:id in OOo 3.3)
Line 548: Line 548:
 
The document integration parts are not yet completely implemented:
 
The document integration parts are not yet completely implemented:
 
most elements do not support the <code>xml:id</code> that is required for use with RDF.
 
most elements do not support the <code>xml:id</code> that is required for use with RDF.
 +
In OOo version 3.3, support for more elements was added.
 
The following elements can be annotated:
 
The following elements can be annotated:
  
Line 555: Line 556:
 
!Service
 
!Service
 
!description
 
!description
 +
!since
 
|-
 
|-
 
|<code>&lt;text:p&gt;</code>
 
|<code>&lt;text:p&gt;</code>
 
|<idl>com.sun.star.text.Paragraph</idl>
 
|<idl>com.sun.star.text.Paragraph</idl>
 
|paragraph
 
|paragraph
 +
|3.2
 
|-
 
|-
 
|<code>&lt;text:h&gt;</code>
 
|<code>&lt;text:h&gt;</code>
 
|<idl>com.sun.star.text.Paragraph</idl>
 
|<idl>com.sun.star.text.Paragraph</idl>
 
|heading
 
|heading
 +
|3.2
 
|-
 
|-
 
|<code>&lt;text:bookmark&gt;</code>
 
|<code>&lt;text:bookmark&gt;</code>
 
|<idl>com.sun.star.text.Bookmark</idl>
 
|<idl>com.sun.star.text.Bookmark</idl>
 
|bookmark
 
|bookmark
 +
|3.2
 
|-
 
|-
 
|<code>&lt;text:bookmark-start&gt;</code>
 
|<code>&lt;text:bookmark-start&gt;</code>
 
|<idl>com.sun.star.text.Bookmark</idl>
 
|<idl>com.sun.star.text.Bookmark</idl>
|bookmark
+
|bookmark with range
 +
|3.2
 
|-
 
|-
 
|<code>&lt;text:meta&gt;</code>
 
|<code>&lt;text:meta&gt;</code>
 
|<idl>com.sun.star.text.InContentMetadata</idl>
 
|<idl>com.sun.star.text.InContentMetadata</idl>
 
|annotated text range
 
|annotated text range
 +
|3.2
 
|-
 
|-
 
|<code>&lt;text:meta-field&gt;</code>
 
|<code>&lt;text:meta-field&gt;</code>
 
|<idl>com.sun.star.text.textfield.MetadataField</idl>
 
|<idl>com.sun.star.text.textfield.MetadataField</idl>
 
|text field whose content is generated from metadata
 
|text field whose content is generated from metadata
 +
|3.2
 +
|-
 +
|<code>&lt;text:section&gt;</code>
 +
|<idl>com.sun.star.text.TextSection</idl>
 +
|text section
 +
|3.3
 +
|-
 +
|<code>&lt;text:index-title&gt;</code>
 +
|<idl>com.sun.star.text.TextSection</idl>
 +
|index title section
 +
|3.3
 +
|-
 +
|<code>&lt;text:alphabetical-index&gt;</code>
 +
|<idl>com.sun.star.text.DocumentIndex</idl>
 +
|alphabetical index
 +
|3.3
 +
|-
 +
|<code>&lt;text:user-index&gt;</code>
 +
|<idl>com.sun.star.text.UserDefinedIndex</idl>
 +
|user defined index
 +
|3.3
 +
|-
 +
|<code>&lt;text:table-of-content&gt;</code>
 +
|<idl>com.sun.star.text.ContentIndex</idl>
 +
|table of content
 +
|3.3
 +
|-
 +
|<code>&lt;text:table-index&gt;</code>
 +
|<idl>com.sun.star.text.TableIndex</idl>
 +
|table index
 +
|3.3
 +
|-
 +
|<code>&lt;text:object-index&gt;</code>
 +
|<idl>com.sun.star.text.ObjectIndex</idl>
 +
|object index
 +
|3.3
 +
|-
 +
|<code>&lt;text:illustration-index&gt;</code>
 +
|<idl>com.sun.star.text.IllustrationsIndex</idl>
 +
|illustration index
 +
|3.3
 +
|-
 +
|<code>&lt;text:bibliography&gt;</code>
 +
|<idl>com.sun.star.text.Bibliography</idl>
 +
|bibliography
 +
|3.3
 
|}
 
|}
  

Revision as of 14:45, 12 October 2010





ODF 1.2 introduces a new metadata mechanism based on RDF. RDF expands to Resource Description Framework, and is an W3C standard. Please refer to the W3C for information about RDF: http://www.w3.org/RDF/

Especially the first 2 sections of the RDF Primer are required reading for understanding the (really quite simple) basic concepts of the RDF data model: RDF Primer

If you like reading an in-depth specification, please refer to: RDF Concepts and Abstract Syntax

If you are interested in some motivational use cases, and an overview of the basic design from the ODF perspective, have a look at the ODF Metadata examples document.

The OpenOffice.org implementation of the RDF data model lives in the com.sun.star.rdf module.


Nodes

First, there are the basic RDF node types:

RDF node type Interface Service
node com.sun.star.rdf.XNode
literal com.sun.star.rdf.XLiteral com.sun.star.rdf.Literal
resource com.sun.star.rdf.XResource
URI com.sun.star.rdf.XURI com.sun.star.rdf.URI
blank node com.sun.star.rdf.XBlankNode com.sun.star.rdf.BlankNode
interface XNode
{
    [readonly, attribute] string StringValue;
};

The interface XResource is only necessary to separate the literal, which is not a resource, from the blank node and URI.


Literals

Literals are represented by the service com.sun.star.rdf.Literal.

interface XLiteral : XNode
{
    [readonly, attribute] string Value;
    [readonly, attribute] string Language;
    [readonly, attribute] XURI   Datatype;
};

service Literal : XLiteral
{
    create( [in] string Value );

    createWithType( [in] string Value, [in] XURI Type );

    createWithLanguage( [in] string Value, [in] string Language );
};

This service has three constructors, one for every distinct kind of literal in RDF.

The simplest kind of literal is a plain value.

A literal may also have a data type, which is represented by an URI. Such literals are called typed literals. The W3C XMLSchema Part 2 specification contains several widely used data types such as numbers, booleans, dates, and many more.

It is also possible to create a literal with a specific language. This makes it possible to have a multi-lingual RDF graph, where statements are repeated with the same subject and predicate, but different objects that contain the same text content in different languages.


URIs

In RDF, URIs are used to denote resources. URIs may be used as subjects or objects of a statement. In contrast with other node types, URIs may also be used as the predicate of a statement.

service URI : XURI
{
    create( [in] string Value )
        raises( lang::IllegalArgumentException );

    createNS( [in] string Namespace, [in] string LocalName )
        raises( lang::IllegalArgumentException );

    createKnown( [in] short Id )
        raises( lang::IllegalArgumentException );
};

The URI service has constructors create and createNS, which allow for creating an URI from a string. These two constructors are very similar, but createNS allows splitting the parameter into two parts, which may be useful when creating several URIs that share a prefix because they belong to the same vocabulary.

There are many URIs that are well-known because they are specified in various standards, such as XMLSchema datatypes, RDF concepts, OWL, and ODF.

There is a convenient way to construct such URIs: using the createKnown constructor, together with constants from the com.sun.star.rdf.URIs constant group.

    rdf.XURI xContentFile = rdf.URI.createKnown(rdf.URIs.ODF_CONTENTFILE);

Of course, string literals would be easier to use, but unfortunately UNO IDL does not permit them.


Blank nodes

The other kind of RDF node is the blank node, which is a resource, but in contrast to an URI, is not unique. Because blank nodes are not unique, you should only construct them with the createBlankNode method, not with the service constructor, and you should never use a blank node with a different repository than the one that created it.


Statements

Using these nodes, RDF statements can be constructed, which are basically subject-predicate-object triples.

struct Statement
{
    XResource Subject;
    XURI      Predicate;
    XNode     Object;
    /// the named graph that contains this statement, or <NULL/>.
    XURI      Graph;
};

The subject of the statement is the entity that the statement is about. The subject must be either an URI or a blank node; a literal is not allowed.

The predicate denotes what the relationship between the subject and the object is. In order to ensure that statements have a machine-readable semantics, only URIs are allowed as predicates.

The object of the statement may be any kind of RDF node.

If you put many statements together, and these statements share subjects and objects, then you will get a RDF graph.


Graphs

Graphs are represented by the interface com.sun.star.rdf.XNamedGraph. As the name implies, a named graph has a name, which is a URI. This is why the XNamedGraph interface inherits from XURI.

Template:Documentation/Note

interface XNamedGraph : XURI
{
    XURI getName();

    void clear()
        raises( container::NoSuchElementException, RepositoryException );

    void addStatement(
            [in] XResource Subject, [in] XURI Predicate, [in] XNode Object)
        raises( lang::IllegalArgumentException,
                container::NoSuchElementException, RepositoryException );

    void removeStatements(
            [in] XResource Subject, [in] XURI Predicate, [in] XNode Object)
        raises( container::NoSuchElementException, RepositoryException );

    container::XEnumeration/*<Statement>*/ getStatements(
            [in] XResource Subject, [in] XURI Predicate, [in] XNode Object)
        raises( container::NoSuchElementException, RepositoryException );
};

The individual methods will be discussed in subsequent sections. There is no service for a named graph, because named graphs always live in a repository.


Repository

The repository is the centerpiece of the RDF API. It is defined in the interface com.sun.star.rdf.XRepository, and the service com.sun.star.rdf.Repository.

interface XRepository
{
    XBlankNode createBlankNode();

    XNamedGraph importGraph([in] /*FileFormat*/ short Format,
                [in] io::XInputStream InStream,
                [in] XURI GraphName, [in] XURI BaseURI)
        raises( lang::IllegalArgumentException,
                datatransfer::UnsupportedFlavorException,
                container::ElementExistException, ParseException,
                RepositoryException, io::IOException );

    void exportGraph([in] /*FileFormat*/ short Format,
                [in] io::XOutputStream OutStream,
                [in] XURI GraphName, [in] XURI BaseURI)
        raises( lang::IllegalArgumentException,
                datatransfer::UnsupportedFlavorException,
                container::NoSuchElementException, RepositoryException,
                io::IOException );

    sequence<XURI> getGraphNames()
        raises( RepositoryException );

    XNamedGraph getGraph([in] XURI GraphName)
        raises( lang::IllegalArgumentException,
                RepositoryException );

    XNamedGraph createGraph([in] XURI GraphName)
        raises( lang::IllegalArgumentException,
                container::ElementExistException, RepositoryException );

    void destroyGraph([in] XURI GraphName)
        raises( lang::IllegalArgumentException,
                container::NoSuchElementException, RepositoryException );

    container::XEnumeration/*<Statement>*/ getStatements(
            [in] XResource Subject, [in] XURI Predicate, [in] XNode Object)
        raises( RepositoryException );

    /// executes a SPARQL "SELECT" query.
    XQuerySelectResult querySelect([in] string Query)
        raises( QueryException, RepositoryException );

    /// executes a SPARQL "CONSTRUCT" query.
    container::XEnumeration/*<Statement>*/ queryConstruct([in] string Query)
        raises( QueryException, RepositoryException );

    /// executes a SPARQL "ASK" query.
    boolean queryAsk([in] string Query)
        raises( QueryException, RepositoryException );
};

A RDF repository is basically a set of named RDF graphs. The names of the contained graphs can be retrieved via the getGraphNames method. An individual graph can be retrieved via the getGraph method.

A repository may be created as a stand-alone service, or it may be associated with a loaded ODF document. The graphs in a document repository correspond to streams in an ODF package, and thus the graph names consist of the URI of the ODF package and the relative path of the stream within the package.

Documentation caution.png For a document repository, you should not call the createGraph, destroyGraph or importGraph methods directly; instead, call the respective methods of com.sun.star.rdf.XDocumentMetadataAccess, as described below.

Services that provide an RDF repository implement the interface com.sun.star.rdf.XRepositorySupplier.

interface XRepositorySupplier
{
    XRepository getRDFRepository();
};


Document integration

The other main part of the RDF API is the integration in document models. This purpose is served by the interface com.sun.star.rdf.XDocumentMetadataAccess, which is implemented by the Model service of documents.

Template:Documentation/Note

Documentation caution.png Do not use methods such as getURL to create a URI for the document. Always use the XURI interface of the model when you need the RDF URI for a document.

Furthermore, there is the interface com.sun.star.rdf.XMetadatable, which allows document content entities to be used as subjects or objects in the methods that manipulate RDF graphs.

interface XMetadatable : XURI
{
    [attribute] beans::StringPair MetadataReference {
        set raises ( lang::IllegalArgumentException );
    };

    void ensureMetadataReference();
};
Documentation caution.png The XMetadatable interface has an attribute MetadataReference. This attribute is only meant to be set by import filters, such as the ODF import filter. Extensions should use ensureMetadataReference, or simply use a XMetadatable as a parameter to addStatement, which will automatically call ensureMetadataReference().

Annotated text range

The service com.sun.star.text.InContentMetadata allows to add annotations to a range of text. The range of text must be contained within a single paragraph, and annotations must not overlap (but they are allowed to nest).

Template:Documentation/Note

    text.XText = xDoc.getText();
    text.XTextCursor xCursor = ... // position to where you want the annotation
    Object xMeta = xDocFactory.createInstance(
            "com.sun.star.text.InContentMetadata");
    text.XTextContent xContent = (text.XTextContent)
        UnoRuntime.queryInterface(text.XTextContent.class, xMeta);
    try {
        xDocText.insertTextContent(xCursor, xMeta, true);
    } catch (lang.IllegalArgumentException) {
        // overlap?
    }

When the InContentMetadata is successfully inserted, you can add metadata by just using it as the subject or object of an RDF statement.

    rdf.XMetadatable xMetadatable = (rdf.XMetadatable)
        UnoRuntime.queryInterface(rdf.XMetadatable.class, xContent);
    rdf.XURI xComment = rdf.URI.createKnown(rdf.URIs.RDFS_COMMENT);
    rdf.XLiteral xObj = rdf.Literal.create("a most interesting description");
    rdf.XNamedGraph xGraph = xDocRepository.getGraph(xMyGraphURI);
    xGraph.addStatement(xMetadatable, xComment, xObj);


Metadata field

There is a new text field that is explicitly designed for being used with RDF metadata: com.sun.star.text.textfield.MetadataField.

In contrast with the InContentMetadata, where an existing range of text is being annotated with additional metadata, the metadata field allows for text content to be generated from RDF metadata.

For example, a bibliography extension could use a metadata field to insert a citation. The user could tell the bibliography extension which citation format should be used, and the extension will generate the content of all the citation metadata fields based on this choice. The extension may use some bibliography database that may also be stored as an RDF graph.

Metadata fields must be contained within a single paragraph, and must not overlap (but they are allowed to nest).

To enable generating the content, the metadata field implements the com.sun.star.text.XText interface.

    text.XTextContent xMetafield = ... ; // get an inserted metadata field
    text.XText xMetafieldText = (text.XText)
        UnoRuntime.queryInterface(text.XText.class, xMetafield);
    xMetafieldText.setString(""); // clear the field: delete all content
    text.XTextCursor xCursor = xMetafieldText.createCursor();
    xMetafieldText.insertString(xCursor, "field content", true);

Of course, you can not just insert plain text, but everything that you could insert into a paragraph.

    xCursor.gotoEnd(false);
    text.XTextContent xFootnote = ... ; // create and init footnote
    xMetafieldText.insertTextContent(xCursor, xFootnote, false);

Metadata fields have another interesting aspect: they can have a prefix and/or suffix text that is taken from one of the RDF graphs in the document. In this way you can create a metadata field with text content that is automatically displayed (non-editable) based on an RDF statement.

    rdf.XMetadatable xMetafield = ... ; // get an inserted metadata field
    rdf.XNamedGraph xGraph = xDocRepository.getGraph(xMyGraphURI);
    rdf.XURI xPrefix = rdf.URI.createKnown(rdf.URIs.ODF_PREFIX);
    xGraph.addStatement(xMetafield, xPrefix,
        "this text will be displayed as prefix content of the field");


Other document entities

In addition to the annotated text range and the metadata field, other document content entities implement the XMetadatable interface as well. These entities can thus be used in RDF statements. The list of entities which can be thus annotated will grow in future releases of OpenOffice.org.


Adding metadata to a document

The metadata support in ODF 1.2 allows for adding RDF graphs to an ODF package. Every RDF graph is stored as an RDF/XML stream in the package.

There is a special RDF graph called the metadata manifest. This RDF graph belongs to a document and enumerates all the files relevant for metadata, such as the individual RDF/XML files.

This manifest graph is maintained by OpenOffice.org itself; extension authors should not modify it directly, but use the interface com.sun.star.rdf.XDocumentMetadataAccess.

In order to isolate different metadata users, every extension that wants to add metadata should create one (or several) own RDF graph(s). It is recommended to give the new graphs a type that identifies what kind of information is contained, especially if a well-known RDF vocabulary is used. With the method addMetadataFile, you can specify as many types as you want for a graph.

   rdf.XDocumentMetadataAccess xDMA = (rdf.XDocumentMetadataAccess)
       UnoRuntime.queryInterface(rdf.XDocumentMetadataAccess.class, xModel);
   rdf.XURI xType = rdf.URI.create("http://example.com/myextension/v1.0");
   try {
       rdf.XURI xGraphName = xDMA.addMetadataFile("myextension/mygraph.rdf",
           new rdf.XURI[] { xType } );
       rdf.XNamedGraph xGraph = xDMA.getRDFRepository().getGraph(xGraphName);
   } catch (container.ElementExistException e) { // filename exists?
   }

Template:Documentation/Note

Template:Documentation/Tip

Now you can simply insert RDF statements into the graph:

   rdf.XMetadatable xElement = ...; // some document content entity
   rdf.XURI xLabel = rdf.URI.createKnown(rdf.URIs.RDFS_LABEL);
   rdf.XLiteral xObj = rdf.Literal.create("a most interesting description");
   xGraph.addStatement(xElement, xLabel, xObj);


Reading metadata from a document

In order to read metadata that is stored in a document, query the RDF repository of the document.

There are basically two ways to do this. One way is to use the getter methods at the named graph and repository interfaces. The method getStatements will return all statements in any graph in the repository that match the given parameters. But note that this is not necessarily what you want. Usually, an extension is only interested in the metadata it has itself inserted. Thus it is better to first get the graph(s) that the extension is interested in, and then only query those graphs via getStatements.

   rdf.XDocumentMetadataAccess xDMA = (rdf.XDocumentMetadataAccess)
       UnoRuntime.queryInterface(rdf.XDocumentMetadataAccess.class, xModel);
   rdf.XURI xType = rdf.URI.create("http://example.com/myextension/v1.0");
   rdf.XURI[] GraphNames = xDMA.getMetadataGraphsWithType(xType);
   for (rdf.XURI xGraphName : GraphNames) {
       rdf.XNamedGraph xGraph = xDMA.getRDFRepository().getGraph(xGraphName);
       // replace nulls with interesting URIs
       container.XEnumeration xResult = xGraph.getStatements(null, null, null);
       while (xResult.hasMoreElements()) {
           rdf.Statement stmt = (rdf.Statement) xResult.nextElement();
       }
   }

The other way of getting information out of the repository is via the SPARQL query language.

The same considerations as above apply: by default, the result will contain information from all graphs in the repository. You can restrict the graphs via the GRAPH or FROM clauses.

   rdf.XDocumentMetadataAccess xDMA = (rdf.XDocumentMetadataAccess)
       UnoRuntime.queryInterface(rdf.XDocumentMetadataAccess.class, xModel);
   rdf.XURI xType = rdf.URI.create("http://example.com/myextension/v1.0");
   rdf.XURI[] GraphNames = xDMA.getMetadataGraphsWithType(xType);
   if (GraphNames.length > 0) {
       String graphName = GraphNames[0].getStringValue();
       String query =
           "CONSTRUCT { ?s ?p ?o } WHERE { GRAPH <"
               + graphName + "> { ?s ?p ?o } . }";
       container.XEnumeration xResult =
           xDMA.getRDFRepository().queryConstruct(query);
       while (xResult.hasMoreElements()) {
           rdf.Statement stmt = (rdf.Statement) xResult.nextElement();
       }
   }

Besides the CONSTRUCT query shown above, there are also the methods querySelect, which returns a table of results, and queryAsk, which simply returns a boolean. There are different methods for the different query types because the return types differ.

Documentation caution.png Always use the query method that matches the query type.


Mapping from URIs to document entities

Note that for a document content annotation the RDF repository will only give you an URI in its results, not the actual document content entity. In order to map the URI to the document entity, use the getElementByURI method.

   rdf.XDocumentMetadataAccess xDMA = (rdf.XDocumentMetadataAccess)
       UnoRuntime.queryInterface(rdf.XDocumentMetadataAccess.class, xModel);
   rdf.XURI xElemURI = ... // query the document repository
   rdf.XMetadatable xElement = xDMA.getElementByURI(xElemURI);
   text.XTextContent xElemContent = (text.XTextContent)
       UnoRuntime.queryInterface(text.XTextContent.class, xElement);

Template:Documentation/Tip


Removing metadata from a document

The named graph supports the removeStatements method, which removes all statements that match the parameters from the graph.

Template:Documentation/Note

Also, there is the clear method, which is equivalent to removeStatements(null, null, null), and removes all statements from the graph.

For removing whole metadata streams from an ODF document, there is the method removeMetadataFile.


Vocabulary

Some notes on RDF vocabularies; of course, a comprehensive discussion of RDF design is beyond the scope of this guide.

Documentation caution.png Do not use URIs based on the document base URI as properties. Properties should be independent of the location of the document.

An important question when adding RDF metadata is: which vocabulary do you use? First, it is usually a good idea to re-use an existing vocabulary. This will improve the chances that other software, which also supports the existing vocabulary, is able to do something interesting and useful with the metadata that is added.

For example, for basic datatypes you can use the types specified in W3C XMLSchema Part 2. A well-known vocabulary for expressing social relations is Friend-of-a-Friend. A directory of RDF schemata can be found at SchemaWeb. Another useful resource might be DBPedia.

If you do not find an existing vocabulary that matches your use case, here are some hints for designing your own: The most important thing is that the URIs should really be unique. If you have a DNS domain, then that is quite easy to achieve: use URIs like http://example.com/myvocabulary/v1.0/foo. It is probably a good idea to use a versioned namespace prefix like http://example.com/myvocabulary/v1.0/ for all URIs. The version component allows you to evolve the vocabulary to meet future requirements. Now you can use the versioned namespace prefix to denote the format, i.e., as a type for your metadata stream, making it easy to find. You can also create an actual HTML page to document the vocabulary at the URI.


Status

As of OOo version 3.2, RDF metadata is only supported in writer documents. The generic RDF parts of the API are all implemented. The document integration parts are not yet completely implemented: most elements do not support the xml:id that is required for use with RDF. In OOo version 3.3, support for more elements was added. The following elements can be annotated:

ODF element Service description since
<text:p> com.sun.star.text.Paragraph paragraph 3.2
<text:h> com.sun.star.text.Paragraph heading 3.2
<text:bookmark> com.sun.star.text.Bookmark bookmark 3.2
<text:bookmark-start> com.sun.star.text.Bookmark bookmark with range 3.2
<text:meta> com.sun.star.text.InContentMetadata annotated text range 3.2
<text:meta-field> com.sun.star.text.textfield.MetadataField text field whose content is generated from metadata 3.2
<text:section> com.sun.star.text.TextSection text section 3.3
<text:index-title> com.sun.star.text.TextSection index title section 3.3
<text:alphabetical-index> com.sun.star.text.DocumentIndex alphabetical index 3.3
<text:user-index> com.sun.star.text.UserDefinedIndex user defined index 3.3
<text:table-of-content> com.sun.star.text.ContentIndex table of content 3.3
<text:table-index> com.sun.star.text.TableIndex table index 3.3
<text:object-index> com.sun.star.text.ObjectIndex object index 3.3
<text:illustration-index> com.sun.star.text.IllustrationsIndex illustration index 3.3
<text:bibliography> com.sun.star.text.Bibliography bibliography 3.3


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages