User talk:Dnw

From Apache OpenOffice Wiki
Revision as of 19:03, 1 January 2006 by Bdarcus (Talk | contribs)

Jump to: navigation, search

Bibliographic Project's Developer Page

Project Overview

The Bibliographic Project (OOoBib)[1] plans to enhance the bibliographic functions of the OpenOffice.org Writer (wordprocessing) application to achieve:

  1. bibliographic formatting support for:
    • complex features required of commonly used citation styles like APA and Chicago
    • automatically switching between potentially radically different citation styles (ie. footnote to in-text)
  2. a data model that can support a broader range of reference types
  3. integration with remote databases

Our current objective is to design and build OOoBib version 0.1, which will contain the most basic functions for an usable bibliographic facility with the above features.

Terminology

This project deals with the following pieces:

citation
a short description that points to a fuller description elsewhere
reference item
a fuller description; also called a bibliographic entry or item
reference list
a collection of references; also called a bibliography

1st Stage, Bibliographic Facility Redevelopment

Summary

As our first step, we will implement the most simple changes to the OOo core code (the API basic code, and UNO mappings, but not yet the user interface code) necessary to implement basic support for:

  1. Saving and reading enhanced citation support in OpenDocument i.e. the New Citation XML info design and implementation
  2. Inserting and displaying citations in OpenOffice Writer using the new format. (Note this task does not include the GUI interface to insert the citation in the new format, only the UNO interface to provide the basic function.
  3. Storage of document bibliographic data in the OOo document save package and the code changes necessary to read and save that bibliographic data.

When these basic functions are built into OOo and are made assessable via UNO, we can then use rapid prototyping development methods to design and build prototype GUI interfaces and bibliographic formatting engines. We will be able to use any of the programming languages which have OpenOffice bindings: C++, Java, Python and, of course, OpenOffice Basic. We believe that we will find more developers who can work in these languages than by insisting on C++ code from the start. Also it is much easier to build prototypes using Java, Python and OpenOffice Basic than in C++.

NB. When we have designed, built and tested the prototypes and they have been accepted by the OOo community we intend to rebuild them in C++ and to have them made part of the core OpenOffice application. Skills required - good C++ programming and some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice Developer's Guide)

Skills required - some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice Developer's Guide)

Details

The project needs to modify the Writer document-read and document-save modules to support the new OpenDocument enhanced citation format. Implement the citation and bibliography changes to the OOo Writer save file (in Open Document format) accepted by the OpenDocument Technical Committee[2].

New Citation Coding

The changes to the document schema are detailed in our OpenDocument XML Citation Proposal.pdf[3]. Here are two examples of the new citation format. The first is a standard author-year style, with additional page number details:

 <cite:citation>
  <cite:citation-source>
   <cite:biblioref cite:key="Veer1996a">
     <cite:detail cite:units="pages" cite:begin="23" cite:end="24"/>
   </cite:biblioref>
  </cite:citation-source>
  <cite:citation-body>
    <text:span text:style-name="Citation">(Veer, 1996:23-24)</text:span>
  </cite:citation-body>
 </cite:citation>

The second is a footnoted example.

 <cite:citation>
  <cite:citation-source>
   <cite:biblioref cite:key="Veer1996a"/>
  </cite:citation-source>
  <cite:citation-body>
    <text:note text:id="ftn0" text:note-class="footnote">
     <text:note-citation>1</text:note-citation>
     <text:note-body>
       <text:p text:style-name="Footnote">Peter van der Veer (1996) 
       Riots and Rituals: The Construction of Violence and Public 
       Space in Hindu Nationalism, In Paul Brass Ed., Riots and 
       Pogroms (New York:NYU Press) 154–76.</text:p>
    </text:note-body>
  </text:note>
  </cite:citation-body>
 </cite:citation>

The design is such that it is possible to radically change citation formatting without modifying the citation-source element. If a user starts authoring their document in a footnote style, for example, and later must change to an author-year style, the logic is in place to make this a totally seamless switch. Commercial products like Endnote do not support this sort of (quite useful) functionality.

To compare this to the current format see implementation[4]. The changes to the document schema need to be supported by the document save and load modules. These are detailed into the Further References below.

The bibliographic modules in OOo Writer need to modified to support the new schema. The modules that need to be modified are:

  • Bibliography[5]
  • textfield/Bibliography[6]
  • FieldMaster/Bibliography[7]
  • BibliographyDataField[8]

note: Sun developer Florian Reuter has posted an outline[9] of how to store the new citation data.

Bibliographic Data

Currently the Writer saves a complete copy of the bibliographic data associated with a citation, with each citation. We propose to separate the citation and the bibliographic data, by leaving just the citation details in the document save file and place the detailed bibliographic data in a seperate bibliographic data file the OOo save file package. The task is to complete the design of the bibliographic data file and add support for it in the OOo save file package.

The relevant component is "interface XComponentLoader"[10] which supports loadComponentFromURL and storeAsURL.

Note: the OASIS OpenDocument TC is currently discussing plans to enhance metadata support in the file format by using an extensible RDF approch. It is our hope that this will offer support sufficient for this project's needs, so that the process of designing the bibliographic data representation noted above will largely consist of simply using standardized OpenDocument metadata.

Formating Engine

CiteProc[11] is a working proof-of-concept for the formatting functionality we propose to offer in OOo, and has already been used to format demanding citations and references for a published book. It is authored in XSLT 2.0. Because support for this language is still evolving, CiteProc needs to be ported to a language more suitable for integration with OOo. This probably ultimately suggests C++, though it might be valuable to consider doing a version in Python first.

2nd Stage Bibliographic Facility Redevelopment

Summary

The second stage is focused on adding backward and forward compatibility support, integration with remote servers, and user interface improvements.

Details

Backwards and Forwards Compatability

An important object of Bibliographic Enhancement project is to maintain document file backwards compatibility with older versions of OpenOffice. To achieve this when Bibliographic Entries are inserted into a Document they are stored with the same format as is currently the case. A new bibliographic entry tag will be added with the enhanced citation functions, each citation will contain a key that will point to the bibliographic data which will be saved in the document save package. To preserve backwards compatability we will need to also maintain the old bibliographic citation and data storage in the document. Older version of OpenOffice, without the bibliographic enhancements, in the OOo 2.X .ods format, will read the old format of the bibliographic citations and ignore the bibliographic data file in the save package. A suggested approach is illustrated in a flowchart, see[12].

When a major revision of the save package format is introduced the support of the older bibliographic representations can be dropped form the document save file.

Remote Server Integration

Build Z39.50 and SRU/W based internet searching facility using the YAZ toolkit. This would enable searching for and retrieving bibliographic data from internet sources and storing them in a document or bibliographic database.

We would like to use SRU/W as the standard method for OOo retrieving bibliographic data from any source. In that case, even a local Bibliographic database would also be accessed through SRU/W methods. The user would just select a local or remote source and the same access mechanism would be used. SRU[13] is particularly promising because while it shares the same model as the SOAP-based SRW, it is expressed in an easier to implement RESTful protocol.

This would mean adopting a standard API, which ZOOM[14] provides, and then wrapping the YAZ client code in a UNO interface.

Also build Z39.50 and SRU/W server capability into OOo to enable users to share their bibliographic (and other) databases over the internet. One of the Indexdata toolkits could used as a basis. [this may need more thought; sharing is good, but there are different ways to do this]

The modules that may need to be modified are:

Bibliography textfield/Bibliography FieldMaster/Bibliography BibliographyDataField

Graphical User Interface (GUI)

This stage will involve designing a building a GUI to offer:

  • Basic citation insertion
  • Basic bibliographic data entry
  • Citation and bibliographic table formating using Citeproc.
  • Basic Bibliographic database access
  • Basic bibliographic internet search and database storage.

Further References

For an overview of the Bibliographic project's major components and a context diagram see components.html[15]. There is information about the current OpenOffice Bibliographic implementation[16].

A start has been made to the Specification for this work (see the Projects Specifications folder[17] on the Documents and Files page). Also see a attempt at an analysis[18] of the proposed Bibliographic enhancement components and their relationships. The best place to start for finding out about development in OpenOffice is the OpenOffice.org For Developers page[19]. An important resource is the Developer's guide which is part of the SDK (software development kit) or available online[20].

The OOo API is based on UNO (Universal Network Objects)[21] is the interface-based component model of OpenOffice.org. UNO offers interpretability between different programming languages, different object models, different machine architectures and different processes; either in a local network or even via the Internet. UNO components can be implemented in and accessed from any programming language for which a UNO language binding exists. We currently provide several language bindings for UNO which allows to use the API from Java, C++, OpenOffice.org Basic, Python and Common Language Infrastructure (CLI). Implementing the new citation element in xmloff (the XmlOffice module) is a routine task. The Sun developers want to do it together with our programmer, so that he/she can learn how xmloff works. Florian Reuter, from the Sun OOo team, has written in his blog an explanation of how the citation changes could be implemented.[22]

To modify the Writer save-file read and save modules to support the new the bibliographic data file in the document save package, and to support backwards and forwards compatability logic[23] to Writer the "interface XComponentLoader"[24], which supports loadComponentFromURL and storeAsURL, needs to be enhanced. See the Development Guide explanation for - 6.1.5 Handling Documents[25].

There is also a demonstration client program for the YAZ toolkit (C & C++). - IRTCL[26] that can perform the reference searches. (Requires YAZ and Tcl/Tk libraries be installed). It does everything but save or export the results ! However it is good model of how to use the toolkit and could be used as the basis for or model of a prototype internet searching facility. Screen pic[27], screen pic2[28].

A demonstration internet searching facility that writes selected bibliographic records back to the OOo bibliographic database has been written in Python - PyOOBib[29], instructions[30] are available. Various problems with OOo Python have lead to us concluding that YAZ in C++ would be a better foundation than the Python code.

There is description of the OOo save-file XML Package[31], and is a FAQ[32] about it.

For details about GUI interface design please look at our Project Documentation; GUI Design Documents' Folder[33]

How to get started

Access to the source code for this project is available for download via CVS. A child work space has been created for us called "metabib" which contains a copy of the xmloff[34] (OpenOffice.org XML File Format Definition) and sw (the word processor application component and the WYSIWYG HTML editor component) code. The down load size will be about 1GB(?). And you will need about 2GB of disk space to compile the metabib CWS (Child-Work-Space)[35]. ( Web access to CWS ). If you can not handle that size download then ask us about sending it to you on cdroms. Administration process - you first need to sign the JCA and then obtain the ssh key. After that we will show you how you can access the 'CWS'. It's basically a CVS branch. The most complicated thing is the setup of your tools, such that you can participate in the OOo development --- but, when you have got the ssh key we will show you. See OpenOffice.org For Developers[36] for general development information.

Sample Code

  • Sample python code that reads and outputs some of the fields of the records in the bibliographic database. biblioacess.py[]
  • Sample OpenOffice Basic program to write records to the bibliographic database bibwrite.html[37]
  • Henrik Just's LaTeX and BibTeX export filter http://www.hj-gym.dk/~hj/writer2latex/[38]

Applications which interact with Openoffice- Bibus (WxPython) and B3 (Java).

  • A Perl module OpenOffice::OODoc[39] provides a simple way to access document elements in the (closed i.e. not interactive with OOo) document save file. An example[40] which retrieves bibliographic details is provided.

Contacts

Question or comments can be put to the Bibliographic Project development list dev@bibliographic.openoffice.org or to the project co-leader David Wilson at dnw@openoffice.org.

Personal tools