Difference between revisions of "Bibliographic/Developer Page"

From Apache OpenOffice Wiki
Jump to: navigation, search
(removed first/subseequent to move to stage 1)
 
(66 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[[Bibliographic_Project | Back to Bibliographic Project index]]
 
 
== Bibliographic Project's Developer Page ==
 
== Bibliographic Project's Developer Page ==
 +
 +
 +
=== News  November 4 2008 ===
 +
 +
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer has told us that:
 +
 +
* We have planned to get meta data support for text objects in Writer in 3.1 (due April 2009).
 +
* The RDF repository is implemented and works, the API to work with meta data is defined.
 +
* We also have code in our ODF import/export filters that is able to import and export meta data at any object implementing the API.
 +
* The API is implemented for paragraphs only until now as for all other text objects we needed a new bookmark implementation that was finished just last week. Here we have refactored our bookmark code in Writer to support different kinds of bookmarks, one being bookmarks with meta data(*), another one the new field bookmarks (see below).
 +
** * Remark: we decided to implement "text meta", "text meta field" and "meta bookmark start/end" internally as bookmark pairs, though in the API only the latter would appear as such.
 +
 +
See the [[Writer/Metadata_Support|Metadata Support page]].
 +
 +
[http://www.openoffice.org/issues/show_bug.cgi?id=4260 Bibliographic Improvement Issue #4260]
 +
 +
=== News  July 12 2007 ===
 +
 +
The OASIS OpenDocument Technical Committee approved enhanced metadata support [[http://www.oasis-open.org/committees/download.php/24327/ODF-Metadata-Proposal.pdf pdf]] for inclusion in ODF 1.2. See Bruce's [[Bibliographic/Developer Page/Metadata Implementation Proposal|Metadata Implementation Proposal]].
 +
 +
 +
===[[Old Bibliographic Project News]]===
 +
  
 
=== Project Overview ===
 
=== Project Overview ===
Line 12: Line 34:
 
# integration with remote databases
 
# integration with remote databases
  
The vision here is to transform the bibliographic and citation support in OOo into one that is more feature-rich, and fully dynamic. A user should be able to drag-and-drop a citation on a document and have its reference entry automatically rendered and added to the reference list. Likewise, a user should be able to change styles without any manual editing.
+
The vision here is to transform the bibliographic and citation support in OOo into one that is more feature-rich, and fully dynamic. A user should be able to drag-and-drop a citation into a document and have its reference entry automatically rendered and added to the reference list. Likewise, a user should be able to change styles without any manual editing.
  
Just as importantly, this approach recognizes that refernce management is increasingly network-based. Commercial web-based services like RefWorks [http://www.refworks.com] are taking significant market share from more traditional desktop plug-in applications like Endnote [http://www.endnote.com]. Likewise, there is increasing focus on free software and service alternatives, such as CiteULike [http://www.citeulike.org], Connotea [http://www.connotea.com], RefBase [http://www.refbase.net], and the forthcoming FireFox Scholar [http://echo.gmu.edu/toolcenter-wiki/index.php?title=Firefox_Scholar_(aka_SmartFox)] browser extension. To take advantage of these new opportunities, it must be easy for applications to serve as data sources for OpenOffice, requiring minimal coding.
+
Just as importantly, this approach recognizes that reference management is increasingly network-based. Commercial web-based services like RefWorks [http://www.refworks.com] are taking significant market share from more traditional desktop plug-in applications like Endnote [http://www.endnote.com]. Likewise, there is increasing focus on free software and service alternatives, such as CiteULike [http://www.citeulike.org], Connotea [http://www.connotea.com], RefBase [http://www.refbase.net], and the Zotero [http://zotero.org] Firefox browser extension. To take advantage of these new opportunities, it must be easy for applications to serve as data sources for OpenOffice, requiring minimal coding.
  
Our current objective is to design and build OOoBib version 0.1, which will contain the most basic functions for an usable bibliographic facility with the above features.  
+
Our current objective is to design and build OOoBib version 0.1, which will contain the most basic functions for a usable bibliographic facility with the above features.  
  
 
See [http://qa.openoffice.org/issues/show_bug.cgi?id=4260 issue 4260]. Please consider [http://www.openoffice.org/scdocs/ddIssues_EnterModify#vote voting] for this issue.
 
See [http://qa.openoffice.org/issues/show_bug.cgi?id=4260 issue 4260]. Please consider [http://www.openoffice.org/scdocs/ddIssues_EnterModify#vote voting] for this issue.
Line 34: Line 56:
 
=== Summary ===
 
=== Summary ===
  
As our first step, we will implement the most simple changes to the OOo Writer core code (the API basic code, and UNO mappings, but not yet the user interface code) necessary to implement basic support for:
+
As our first step, we will implement the internal metadata and API enhancements to enable Bibliographic extensions to OpenOffice such as [http://zotero.org Zotero] to better interact with OpenOffice. Currently when you insert citation into Writer with Zotero, only the text of the formatted citation is stored in the Writer Document, all the citation metadata resides in the database. If you wish to share your document with the full bibliographic data you would need to package it with a copy of the Zotero database. The aim is to store both formatted citation its metadata in the Writer document. Zotero would be able to retrieve the bibliographic data from the document and when necessary reformat the citations and references tables from the original metadata.
  
# Saving and reading the new citation field in OpenDocument i.e. the [[New Citation XML info design and implementation]]
+
To achieve this we will implement the most simple changes to the OOo Writer core code (the API basic code, and UNO mappings, but not yet the user interface code) necessary to implement basic support for:
# Inserting and displaying citations in OpenOffice Writer using the new field. '''Note:''' this task does not include the GUI interface to insert the citation in the new format, only the UNO interface to provide the basic function. (See some[[Citeproc Writer Interaction | implementation discussion]]).
+
# Storage of bibliographic reference metadata in the OOo document save package and the code changes necessary to read and save that metadata.
+
  
When these basic functions are built into OOo Writer and are made assessable via UNO, we can then use rapid prototyping development methods to design and build prototype GUI interfaces and bibliographic formatting engines. We will be able to use any of the programming languages which have OpenOffice bindings: C++, Java, Python and, of course, OpenOffice Basic. We believe that we will find more developers who can work in these languages than by insisting on C++ code from the start. Also it is much easier to build prototypes using Java, Python and OpenOffice Basic than in C++.
+
#Saving and reading the new citation field in OpenDocument i.e. the [[New_Citation_XML_info_design_and_implementation |New Citation XML info design and implementation]] using a new API service we have called the "Metadata & User Defined Data Access". This service will provide generalised access to the new metadata features accepted for  inclusion the release of ODF 1.2 [http://www.oasis-open.org/committees/download.php/24327/ODF-Metadata-Proposal.pdf pdf] as well as support for ‘User Defined Data Access’ to the XML data file storage.
+
#Storage of bibliographic reference metadata in the OOo document save package using the new "Metadata & User Defined Data Access" service.
'''NB'''. When we have designed, built and tested the prototypes and they have been accepted by the OOo community we intend to rebuild them in C++ and to have them made part of the core OpenOffice application.
+
#Inserting and displaying citations in OpenOffice Writer using a new API service [[Bibliographic/Developer Page/API Enhancements#Service_BibliographicCitation |BibliographicCitation]] (replacing the current [http://api.openoffice.org/docs/common/ref/com/sun/star/text/BibliographyDataField.html# BibliographyDataField]). Note: this task does not include the GUI interface to insert the citation in the new format, only the UNO interface to provide the basic function. (See some [[Bibliographic/Developer Page/Current Implementation of the Bibliographic Component |implementation discussion]]).
Skills required - good C++ programming and some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice Developer's Guide)
+
 
 +
When these basic functions are built into OOo Writer and are made assessable via UNO, we can then use rapid prototyping development methods to design and build prototype GUI interfaces and bibliographic formatting engines using the [http://extensions.openoffice.org/ Extension Development Toolkit] and [http://blogs.sun.com/GullFOSS/entry/successful_community_project_smart_tags Smart Tags] {now implemented, issue [http://www.openoffice.org/issues/show_bug.cgi?id=75130 #75130], [http://specs.openoffice.org/appwide/SmartTags/Smart_Tags_Specification.odt Smart Tags Specification.odt]}.  We will be able to use any of the programming languages which have OpenOffice bindings and are supported by the toolkit: C++, Java, Python and, of course, OpenOffice Basic. We believe that we will find more developers who can work in these languages than by insisting on C++ code from the start. Also it is much easier to build prototypes using Java, Python and OpenOffice Basic than in C++.  
  
 
'''Skills required''' - some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice [http://api.openoffice.org/DevelopersGuide/DevelopersGuide.html Developer's Guide] and [[Using_Cpp_with_the_OOo_SDK|Using C++ with SDK]])
 
'''Skills required''' - some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice [http://api.openoffice.org/DevelopersGuide/DevelopersGuide.html Developer's Guide] and [[Using_Cpp_with_the_OOo_SDK|Using C++ with SDK]])
Line 49: Line 70:
 
=== Details ===
 
=== Details ===
  
The project needs to modify the Writer document-read and document-save modules to support the new OpenDocument enhanced citation field. This involves implementing the citation and bibliography changes to the OOo Writer save file (in Open Document format) accepted by the [http://lists.oasis-open.org/archives/office/200409/msg00023.html OpenDocument Technical Committee], as well as current metadata work. Both of these are scheduled for inclusion in ODF 1.2, scheduled for delivery in 2007.
+
==== New Citation Field API ====
 +
The changes to the document schema are detailed in our [http://www.oasis-open.org/committees/download.php/24327/ODF-Metadata-Proposal.pdf OpenDocument Metadata model document pdf].  
  
==== New Citation Field ====
+
'''NOTE: The OpenDocument Metadata model has now been accepted [[http://www.oasis-open.org/committees/download.php/24327/ODF-Metadata-Proposal.pdf pdf]]. The examples in this section have not yet been adjusted to the format in that document.''' [[User:Dnw|David Wilson]] 02:42, 17 July 2007 (CEST)
  
The changes to the document schema are detailed in our [http://bibliographic.openoffice.org/XML-bibliography-proposal.pdf OpenDocument XML Citation Proposal pdf]. The new field consists of:
+
The new field consists of:
 
# one or more references to bibliographic metadata records
 
# one or more references to bibliographic metadata records
 
# the formatted citation content
 
# the formatted citation content
Line 96: Line 118:
 
Note: The cite-key ,<pre><cite:biblioref cite:key="urn:isbn:0814712827#154"/></pre> in the above example, provides a link to the reference data stored in the [[Bibliographic_Document_XML_Format#biblo-data.xml|biblio-data.xml]] file in the save package. There is a [http://bibliographic.openoffice.org/enhanced-save-package-description.html document] comparing the current save-package format to the proposed format, and [[Bibliographic_Document_XML_Format|a complete document example]]. The changes to the document schema need to be supported by the document save and load modules. These are detailed in the [[#Further References|Further References]] below.
 
Note: The cite-key ,<pre><cite:biblioref cite:key="urn:isbn:0814712827#154"/></pre> in the above example, provides a link to the reference data stored in the [[Bibliographic_Document_XML_Format#biblo-data.xml|biblio-data.xml]] file in the save package. There is a [http://bibliographic.openoffice.org/enhanced-save-package-description.html document] comparing the current save-package format to the proposed format, and [[Bibliographic_Document_XML_Format|a complete document example]]. The changes to the document schema need to be supported by the document save and load modules. These are detailed in the [[#Further References|Further References]] below.
  
The bibliographic code modules in OOo Writer need to be modified to support the new schema. The modules that need to be modified are:
+
'''Details of proposed Bibliographic API Enhancements are a on separate wiki page [[Bibliographic/Developer Page/API Enhancements | Bibliographic_API_Enhancements]].'''
  
* [http://bibliographic.openoffice.org/implementation.html#text ''Bibliography'']
+
===== First Citation and Ibid. Flags =====
 +
The new citation API must be able to set a "first" [[Writer_enhancements_for_OOBib#Different_treatment_for_first_and_subsequent_uses_of_the_citation  | first and subsequent occurrence]] flag and for the the first occurence of a reference, as well as a 'repeated, adjacent citations' flag (ibid). These flags would be passed on to the formatting engine so that, if they are required by the style selected by the user,  first and subsequent formatted citations, or 'repeated, adjacent citations' options such as Ibid, could be returned to Writer.
  
* [http://bibliographic.openoffice.org/implementation.html#text1 ''textfield/Bibliography'']
+
'''Note:''' Setting the first and subsequent occurrence flag needs to work with the footnote counting options options set in Tools-->Footnotes-->TAB=Footnotes:'Footnote Counting', which are: Per Document, Per Page and Per Chapter. (Consideration needs to be given the range permitted in 'repeated, adjacent citations' test. You probably should not start a new chapter with an Ibid. citation, it can even be annoying across page boundaries, when you have to turn the page back to check the citation. So perhaps the range should always be 'Per Page').
  
* [http://bibliographic.openoffice.org/implementation.html#text2 ''FieldMaster/Bibliography'']
+
'''Method''': The current API of the Writer provides an interface [http://api.openoffice.org/docs/common/ref/com/sun/star/text/XTextRangeCompare.html XTextRangeCompare].
 +
With this interface two [http://api.openoffice.org/docs/common/ref/com/sun/star/text/TextRange.html text ranges] (citations are also text ranges) can
 +
be sorted. This interface is implemented at each [http://api.openoffice.org/docs/common/ref/com/sun/star/text/Text.html text part] (body text,
 +
header, footer, footnote areas, text frames, table cells, draw text).
  
* [http://bibliographic.openoffice.org/implementation.html#text3 ''BibliographyDataField'']
+
===== Footnote Style Citations =====
 
+
The content of the citation field body should allow foot/endnotes in order to support [[Bibliographic/Writer_enhancements_for_OOBib#Support_for_the_footnote_citation_style | footnote style citations]]. This should not be difficult to build as it would be utilising the existing footnote machinery.
<strong>note:</strong> In CP Hennessy's award winning a [[Current Implementation of the OpenOffice.org Bibliographic Component | article]] he explains the citation facilities of OpenOffice.org, examining the APIs available to the programmer to manipulate the citation data, and how these API calls actually map to real C++ classes in the OpenOffice.org source code. Also, former Sun developer Florian Reuter has posted an [http://blogs.sun.com/roller/page/flo?entry=the_community_and_me_the outline] of how to store the new citation data.
+
  
 
====  Bibliographic Reference Data ====  
 
====  Bibliographic Reference Data ====  
Line 114: Line 139:
  
 
Note: the OASIS OpenDocument TC is currently discussing plans to enhance metadata support in the file format by using an extensible RDF approach. It is our hope that this will offer support sufficient for this project's needs, so that the process of designing the bibliographic data representation noted above will largely consist of simply using standardized OpenDocument metadata.
 
Note: the OASIS OpenDocument TC is currently discussing plans to enhance metadata support in the file format by using an extensible RDF approach. It is our hope that this will offer support sufficient for this project's needs, so that the process of designing the bibliographic data representation noted above will largely consist of simply using standardized OpenDocument metadata.
 +
 +
==== Data Source API ====
 +
 +
Given the importance of integrating with remote databases such as library catalogs and third-party bibliographic database applications and services, it is essential OOo provide a standard API accessible via UNO. [http://zoom.z3950.org/index.html ZOOM] provides just this, and the YAZ client code from Index Data is an excellent and liberally licensed open source tookit [http://www.indexdata.dk/yaz/].
 +
 +
ZOOM provides an easy way to support [http://www.loc.gov/z3950/agency/zing/srw/background.html SRU/W] as the standard method for OOo retrieving bibliographic data from any source. The user would just select a local or remote source and the same access mechanism would be used. [http://www.loc.gov/standards/sru/ SRU] is particularly promising because while it shares the same model as the SOAP-based SRW, it is expressed in an easier to implement RESTful protocol.
 +
 +
The task, then, is to wrap the YAZ client code in a UNO interface.
 +
 +
The modules that may need to be modified are:
 +
[http://bibliographic.openoffice.org/implementation.html#text ''Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text1 ''textfield/Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text2 ''FieldMaster/Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text3 ''BibliographyDataField''].
  
 
====  Formating Engine ====  
 
====  Formating Engine ====  
Line 123: Line 159:
 
There are two options, then. One option discussed on the XML dev list is to bundle Saxon as the default XSLT processor in OOo, which would allow the existing CiteProc implementation to be used more-or-less as is, or for other solutions to be easily swapped in (as Microsoft allows).
 
There are two options, then. One option discussed on the XML dev list is to bundle Saxon as the default XSLT processor in OOo, which would allow the existing CiteProc implementation to be used more-or-less as is, or for other solutions to be easily swapped in (as Microsoft allows).
  
The other option is to simple port CiteProc to another language more suitable for integration with OOo. Because formatting is configured with a simple dedicated XML language, this is not hard to do, and there are already versions under way in Ruby, Python and Javascript, the first two of which are currently on a Subversion server:
+
The other option is to simply port CiteProc to another language more suitable for integration with OOo. Because formatting is configured with a simple dedicated XML language, this is not hard to do, and there are already versions under way in Ruby, Python and Javascript, the first two of which are currently on a Subversion server:
  
  svn list https://svn.sourceforge.net/svnroot/xbiblio
+
  svn list https://xbiblio.svn.sourceforge.net/svnroot/xbiblio
 
  citeproc-py/
 
  citeproc-py/
 
  citeproc-rb/
 
  citeproc-rb/
Line 133: Line 169:
 
  playing with that (there's not much there!), you can just do:
 
  playing with that (there's not much there!), you can just do:
 
  .
 
  .
  svn co https://svn.sourceforge.net/svnroot/xbiblio/citeproc-py
+
  svn co https://xbiblio.svn.sourceforge.net/svnroot/xbiblio/citeproc-py
  
 
== 2nd Stage Bibliographic Facility Redevelopment ==
 
== 2nd Stage Bibliographic Facility Redevelopment ==
Line 142: Line 178:
  
 
=== Details ===
 
=== Details ===
 
====Footnote Style Citations====
 
Build in support for [[Writer_enhancements_for_OOBib#Support_for_the_footnote_citation_style | footnote style citations]]. This should not be difficult to build as it would be utilising the exiting footnote machinery.
 
 
==== Backwards and Forwards Compatibility ====
 
An important object of Bibliographic Enhancement project is to maintain document file backwards compatibility with older versions of OpenOffice. To achieve this when Bibliographic Entries are inserted into a Document they are stored with the same format as is currently the case. A new bibliographic entry tag will be added with the enhanced citation functions, each citation will contain a key that will point to the bibliographic data which will be saved in the document save package. To preserve backwards compatability we will need to also maintain the old bibliographic citation and data storage in the document. Older version of OpenOffice, without the bibliographic enhancements, in the OOo 2.X .ods format, will read the old format of the bibliographic citations and ignore the bibliographic data file in the save package. A suggested approach is illustrated in a [http://bibliographic.openoffice.org/backwards.png flowchart].
 
 
When a major revision of the save package format is introduced the support of the older bibliographic representations can be dropped form the document save file.
 
 
==== Remote Server Integration ====
 
Build [http://bibliographic.openoffice.org/biblio-sw.html#ZING Z39.50] and [http://www.loc.gov/z3950/agency/zing/srw/background.html SRU/W] based internet searching facility using the [http://www.indexdata.dk/yaz/ YAZ] toolkit. This would enable searching for and retrieving bibliographic data from internet sources and storing them in a document or bibliographic database.
 
 
We would like to use SRU/W as the standard method for OOo retrieving bibliographic data from any source. In that case, even a local Bibliographic database would also be accessed through SRU/W methods. The user would just select a local or remote source and the same access mechanism would be used. [http://www.loc.gov/standards/sru/ SRU] is particularly promising because while it shares the same model as the SOAP-based SRW, it is expressed in an easier to implement RESTful protocol.
 
 
This would mean adopting a standard API, which [http://zoom.z3950.org/index.html ZOOM] provides, and then wrapping the YAZ client code in a UNO interface.
 
 
Also build Z39.50 and SRU/W server capability into OOo to enable users to share their bibliographic (and other) databases over the internet. One of the Indexdata toolkits could used as a basis. [this may need more thought; sharing is good, but there are different ways to do this]
 
 
The modules that may need to be modified are:
 
[http://bibliographic.openoffice.org/implementation.html#text ''Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text1 ''textfield/Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text2 ''FieldMaster/Bibliography''],[http://bibliographic.openoffice.org/implementation.html#text3 ''BibliographyDataField''].
 
  
 
==== Graphical User Interface (GUI) ====
 
==== Graphical User Interface (GUI) ====
Line 171: Line 187:
 
* Basic bibliographic internet search and database storage.
 
* Basic bibliographic internet search and database storage.
 
* Automatic generation of references from an OOo-document or parts thereof, similar to enhanced bookmarking by using the document properties. --[[User:MadBoP|MadBoP]] 10:56, 15 April 2006 (CEST)
 
* Automatic generation of references from an OOo-document or parts thereof, similar to enhanced bookmarking by using the document properties. --[[User:MadBoP|MadBoP]] 10:56, 15 April 2006 (CEST)
 +
 +
==== Backwards and Forwards Compatibility ====
 +
An important object of Bibliographic Enhancement project is to maintain document file backwards compatibility with older versions of OpenOffice. To achieve this when Bibliographic Entries are inserted into a Document they are stored with the same format as is currently the case. A new bibliographic entry tag will be added with the enhanced citation functions, each citation will contain a key that will point to the bibliographic data which will be saved in the document save package. To preserve backwards compatability we will need to also maintain the old bibliographic citation and data storage in the document. Older version of OpenOffice, without the bibliographic enhancements, in the OOo 2.X .ods format, will read the old format of the bibliographic citations and ignore the bibliographic data file in the save package. A suggested approach is illustrated in a [http://bibliographic.openoffice.org/backwards.png flowchart].
 +
 +
When a major revision of the save package format is introduced the support of the older bibliographic representations can be dropped form the document save file.
  
 
== Further References ==
 
== Further References ==
  
First, see a list of the [[OOoBib_Functional_Requirements | Functional Requirements of the OpenOffice Bibliographic Module]] and the [[Bibliographic_Database | Bibliographic Database enhancement proposals]] which provides details of our development plans and basic information for potential developers.
+
First, see a list of the [[Bibliographic/OOoBib_Functional_Requirements | Functional Requirements of the OpenOffice Bibliographic Module]] and the draft [[Bibliographic/API_Enhancements |Bibliographic API Enhancements]] which provide details of our development plans and basic information for potential developers.
  
 
For an overview of the Bibliographic project's major components and components see the [http://bibliographic.openoffice.org/components.html context diagram]. There is information about the current OpenOffice Bibliographic [http://bibliographic.openoffice.org/implementation.html implementation]. There is an example of bibliographic data in a [http://bibliographic.openoffice.org/xml_contents.html Writer XML save file]
 
For an overview of the Bibliographic project's major components and components see the [http://bibliographic.openoffice.org/components.html context diagram]. There is information about the current OpenOffice Bibliographic [http://bibliographic.openoffice.org/implementation.html implementation]. There is an example of bibliographic data in a [http://bibliographic.openoffice.org/xml_contents.html Writer XML save file]
 +
 +
For a description of the Citation Style language (CSL) schema see [http://bibliographic.openoffice.org/files/documents/124/3897/csl.odt csl.odt] or [http://bibliographic.openoffice.org/files/documents/124/3898/csl-schema.pdf csl-schema.pdf].
  
 
A start has been made to the Specification for this work (see the [http://bibliographic.openoffice.org/servlets/ProjectDocumentList?folderID=266 Projects Specifications folder] on the Documents and Files page). Also see a attempt at an [http://bibliographic.openoffice.org/mindmap/content-analysis.html analysis] of the proposed Bibliographic enhancement components and their relationships.
 
A start has been made to the Specification for this work (see the [http://bibliographic.openoffice.org/servlets/ProjectDocumentList?folderID=266 Projects Specifications folder] on the Documents and Files page). Also see a attempt at an [http://bibliographic.openoffice.org/mindmap/content-analysis.html analysis] of the proposed Bibliographic enhancement components and their relationships.
Line 187: Line 210:
 
SW_SERVICE_INDEX_BIBLIOGRAPHY is a bibliography related service. The Bibliography table (reference table) is processed like other indexes such  
 
SW_SERVICE_INDEX_BIBLIOGRAPHY is a bibliography related service. The Bibliography table (reference table) is processed like other indexes such  
 
as 'Table of Contents', Table of Illustrations' etc. These index functions are handled by the code module [http://sw.openoffice.org/source/browse/sw/sw/source/core/unocore/unoidx.cxx?rev=1.57.94.2&content-type=text/vnd.viewcvs-markup unoidx.cxx]. Basic text functions are handled in the code module [http://sw.openoffice.org/source/browse/sw/sw/source/core/unocore/unotext.cxx?rev=1.27.326.5&content-type=text/vnd.viewcvs-markup unotext.cxx].
 
as 'Table of Contents', Table of Illustrations' etc. These index functions are handled by the code module [http://sw.openoffice.org/source/browse/sw/sw/source/core/unocore/unoidx.cxx?rev=1.57.94.2&content-type=text/vnd.viewcvs-markup unoidx.cxx]. Basic text functions are handled in the code module [http://sw.openoffice.org/source/browse/sw/sw/source/core/unocore/unotext.cxx?rev=1.27.326.5&content-type=text/vnd.viewcvs-markup unotext.cxx].
 
Implementing the new citation element in ''xmloff'' (the XmlOffice module) is a routine task. The Sun developers want to do it together with our programmer, so that he/she can learn how xmloff works. Florian Reuter, from the Sun OOo team, has written in his [http://blogs.sun.com/roller/page/flo?entry=the_community_and_me_the blog] an explanation of how the citation changes could be implemented. '''Note: [[mailto:cphennessy@openoffice.org CPHennessy]] has been working on this task and has implemented almost everything necessary for parsing the new cite: elements and attributes. It compiles and parses the supplied example file.''' See his [[Current_Implementation_of_the_OpenOffice.org_Bibliographic_Component | article]] explaining the citation facilities of OpenOffice.org, examining the APIs available to the programmer to manipulate the citation data, and how these API calls actually map to real C++ classes in the OpenOffice.org source code.
 
 
To modify the Writer save-file read and save modules to support the new the bibliographic data file in the document save package, and to support backwards and forwards [http://bibliographic.openoffice.org/backwards.png compatability logic] to Writer the [http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XComponentLoader.html "''interface XComponentLoader''"], which supports loadComponentFromURL and storeAsURL, needs to be enhanced. See the Development Guide explanation for - [http://api.openoffice.org/docs/DevelopersGuide/OfficeDev/OfficeDev.xhtml#1_1_5_Handling_Documents 6.1.5 Handling Documents]. See [[Bibliographic Document XML Format| sample Writer save file contents]] for a (a .odt file) with the proposed bibliographic enhancements.
 
  
 
There is also a demonstration client program for the [http://www.indexdata.dk/ YAZ toolkit] (C & C++). - [http://www.indexdata.dk/irtcl/ IRTCL] that can perform the reference searches. (Requires YAZ and Tcl/Tk libraries be installed). It does everything but save or export the results ! However it is good model of how to use the toolkit and could be used as the basis for or model of a prototype internet searching facility. [http://bibliographic.openoffice.org/irclient.jpeg Screen pic], [http://bibliographic.openoffice.org/irclient-setup.png screen pic2].
 
There is also a demonstration client program for the [http://www.indexdata.dk/ YAZ toolkit] (C & C++). - [http://www.indexdata.dk/irtcl/ IRTCL] that can perform the reference searches. (Requires YAZ and Tcl/Tk libraries be installed). It does everything but save or export the results ! However it is good model of how to use the toolkit and could be used as the basis for or model of a prototype internet searching facility. [http://bibliographic.openoffice.org/irclient.jpeg Screen pic], [http://bibliographic.openoffice.org/irclient-setup.png screen pic2].
Line 196: Line 215:
 
A demonstration internet searching facility that writes selected bibliographic records back to the OOo bibliographic database has been written in Python - [http://bibliographic.openoffice.org/files/documents/124/1675/PyOOBib-02.zip PyOOBib], [http://bibliographic.openoffice.org/files/documents/124/2446/file_2446.dat?filename=PyOOBib%20Instructions%2esxw instructions] are available. Various problems with OOo Python have lead to us concluding that YAZ in C++ would be a better foundation than the Python code.
 
A demonstration internet searching facility that writes selected bibliographic records back to the OOo bibliographic database has been written in Python - [http://bibliographic.openoffice.org/files/documents/124/1675/PyOOBib-02.zip PyOOBib], [http://bibliographic.openoffice.org/files/documents/124/2446/file_2446.dat?filename=PyOOBib%20Instructions%2esxw instructions] are available. Various problems with OOo Python have lead to us concluding that YAZ in C++ would be a better foundation than the Python code.
  
There is [http://xml.openoffice.org/package.html description] of the OOo save-file XML Package, and is a [http://xml.openoffice.org/faq.html#4 FAQ] about it. Also an [[New_Citation_XML_info_design_and_implementation | example]] showing the proposed bibliographic enhancements.
+
There is [http://xml.openoffice.org/package.html description] of the OOo save-file XML Package, and is a [http://xml.openoffice.org/faq.html#4 FAQ] about it. Also an [[Bibliographic/Developer Page/New Citation XML info design and implementation | example]] showing the proposed bibliographic enhancements.
  
 
For details about GUI interface design please look at our Project Documentation; [http://bibliographic.openoffice.org/servlets/ProjectDocumentList?folderID=451&expandFolder=451&folderID=0 GUI Design Documents' Folder]
 
For details about GUI interface design please look at our Project Documentation; [http://bibliographic.openoffice.org/servlets/ProjectDocumentList?folderID=451&expandFolder=451&folderID=0 GUI Design Documents' Folder]
Line 218: Line 237:
 
Applications for importing/exporting different bibliographic formats.
 
Applications for importing/exporting different bibliographic formats.
 
* A python script for importing RIS format reference(s), [http://bibliographic.openoffice.org/files/documents/124/3078/RISImport.py RISmport.py]. Possibly of some value as it hashes out some RIS details on mapping between fields, and suggests "sensitive" mapping for different reference types.
 
* A python script for importing RIS format reference(s), [http://bibliographic.openoffice.org/files/documents/124/3078/RISImport.py RISmport.py]. Possibly of some value as it hashes out some RIS details on mapping between fields, and suggests "sensitive" mapping for different reference types.
 +
* An OpenOffice bibliographic database RIS export program in python - [[Bibliographic/Hints and Tips/OOoRISExport.py| OOoRISExport.py]]
  
 
== Contacts ==
 
== Contacts ==
Line 223: Line 243:
 
Question or comments can be put to the Bibliographic Project development list  [mailto:dev@bibliographic.openoffice.org dev@bibliographic.openoffice.org] or to the project co-leader [mailto:dnw@openoffice.org David Wilson].
 
Question or comments can be put to the Bibliographic Project development list  [mailto:dev@bibliographic.openoffice.org dev@bibliographic.openoffice.org] or to the project co-leader [mailto:dnw@openoffice.org David Wilson].
  
[[Category:Development]] [[Category:Bibliographic]]
+
[[Category:Bibliographic]]

Latest revision as of 11:25, 28 March 2010

Bibliographic Project's Developer Page

News November 4 2008

Mathias Bauer (mba) - Project Lead OpenOffice.org Writer has told us that:

  • We have planned to get meta data support for text objects in Writer in 3.1 (due April 2009).
  • The RDF repository is implemented and works, the API to work with meta data is defined.
  • We also have code in our ODF import/export filters that is able to import and export meta data at any object implementing the API.
  • The API is implemented for paragraphs only until now as for all other text objects we needed a new bookmark implementation that was finished just last week. Here we have refactored our bookmark code in Writer to support different kinds of bookmarks, one being bookmarks with meta data(*), another one the new field bookmarks (see below).
    • * Remark: we decided to implement "text meta", "text meta field" and "meta bookmark start/end" internally as bookmark pairs, though in the API only the latter would appear as such.

See the Metadata Support page.

Bibliographic Improvement Issue #4260

News July 12 2007

The OASIS OpenDocument Technical Committee approved enhanced metadata support [pdf] for inclusion in ODF 1.2. See Bruce's Metadata Implementation Proposal.


Old Bibliographic Project News

Project Overview

The Bibliographic Project (OOoBib) plans to enhance the bibliographic functions of the OpenOffice.org Writer (wordprocessing) application to achieve:

  1. citation and reference formatting support for:
    • full suppport of commonly used citation styles like APA and Chicago
    • automatically switching between potentially radically different citation styles (ie. footnote to in-text)
  2. a data model that can support a broader range of reference types
  3. integration with remote databases

The vision here is to transform the bibliographic and citation support in OOo into one that is more feature-rich, and fully dynamic. A user should be able to drag-and-drop a citation into a document and have its reference entry automatically rendered and added to the reference list. Likewise, a user should be able to change styles without any manual editing.

Just as importantly, this approach recognizes that reference management is increasingly network-based. Commercial web-based services like RefWorks [1] are taking significant market share from more traditional desktop plug-in applications like Endnote [2]. Likewise, there is increasing focus on free software and service alternatives, such as CiteULike [3], Connotea [4], RefBase [5], and the Zotero [6] Firefox browser extension. To take advantage of these new opportunities, it must be easy for applications to serve as data sources for OpenOffice, requiring minimal coding.

Our current objective is to design and build OOoBib version 0.1, which will contain the most basic functions for a usable bibliographic facility with the above features.

See issue 4260. Please consider voting for this issue.

Terminology

For clarity, this project deals with the following pieces:

citation
a short description that points to a fuller description elsewhere, either in a note or a reference list
reference item
a fuller description; also called a bibliographic entry or item
reference list
a collection of references; also called a bibliography

1st Stage, Bibliographic Facility Redevelopment

Summary

As our first step, we will implement the internal metadata and API enhancements to enable Bibliographic extensions to OpenOffice such as Zotero to better interact with OpenOffice. Currently when you insert citation into Writer with Zotero, only the text of the formatted citation is stored in the Writer Document, all the citation metadata resides in the database. If you wish to share your document with the full bibliographic data you would need to package it with a copy of the Zotero database. The aim is to store both formatted citation its metadata in the Writer document. Zotero would be able to retrieve the bibliographic data from the document and when necessary reformat the citations and references tables from the original metadata.

To achieve this we will implement the most simple changes to the OOo Writer core code (the API basic code, and UNO mappings, but not yet the user interface code) necessary to implement basic support for:

  1. Saving and reading the new citation field in OpenDocument i.e. the New Citation XML info design and implementation using a new API service we have called the "Metadata & User Defined Data Access". This service will provide generalised access to the new metadata features accepted for inclusion the release of ODF 1.2 pdf as well as support for ‘User Defined Data Access’ to the XML data file storage.
  2. Storage of bibliographic reference metadata in the OOo document save package using the new "Metadata & User Defined Data Access" service.
  3. Inserting and displaying citations in OpenOffice Writer using a new API service BibliographicCitation (replacing the current BibliographyDataField). Note: this task does not include the GUI interface to insert the citation in the new format, only the UNO interface to provide the basic function. (See some implementation discussion).

When these basic functions are built into OOo Writer and are made assessable via UNO, we can then use rapid prototyping development methods to design and build prototype GUI interfaces and bibliographic formatting engines using the Extension Development Toolkit and Smart Tags {now implemented, issue #75130, Smart Tags Specification.odt}. We will be able to use any of the programming languages which have OpenOffice bindings and are supported by the toolkit: C++, Java, Python and, of course, OpenOffice Basic. We believe that we will find more developers who can work in these languages than by insisting on C++ code from the start. Also it is much easier to build prototypes using Java, Python and OpenOffice Basic than in C++.

Skills required - some XML skills with knowledge of, or willingness to learn, the OpenOffice UNO (see the Openoffice Developer's Guide and Using C++ with SDK)

Details

New Citation Field API

The changes to the document schema are detailed in our OpenDocument Metadata model document pdf.

NOTE: The OpenDocument Metadata model has now been accepted [pdf]. The examples in this section have not yet been adjusted to the format in that document. David Wilson 02:42, 17 July 2007 (CEST)

The new field consists of:

  1. one or more references to bibliographic metadata records
  2. the formatted citation content

Here are two examples of the new citation field. The first is a standard author-year style, with additional page number details:

 <text:p>Here is a paragraph with a citation <cite:citation>
  <cite:citation-source>
   <cite:biblioref cite:key="urn:isbn:0814712827#154">
     <cite:detail cite:units="pages" cite:begin="23" cite:end="24"/>
   </cite:biblioref>
  </cite:citation-source>
  <cite:citation-body>
    <text:span text:style-name="Citation">(Veer, 1996:23-24)</text:span>
  </cite:citation-body>
 </cite:citation>
 </text:p>

The second is a footnoted example.

 <text:p>Here is a paragraph with a citation <cite:citation>
  <cite:citation-source>
   <cite:biblioref cite:key="urn:isbn:0814712827#154"/>
  </cite:citation-source>
  <cite:citation-body>
    <text:note text:id="ftn0" text:note-class="footnote">
     <text:note-citation>1</text:note-citation>
     <text:note-body>
       <text:p text:style-name="Footnote">Peter van der Veer (1996) 
       Riots and Rituals: The Construction of Violence and Public 
       Space in Hindu Nationalism, In Paul Brass Ed., Riots and 
       Pogroms (New York:NYU Press) 154–76.</text:p>
    </text:note-body>
  </text:note>
  </cite:citation-body>
 </cite:citation>
 </text:p>

The design is such that it is possible to radically change citation formatting without modifying the citation-source element. If a user starts authoring their document in a footnote style, for example, and later must change to an author-year style, the logic is in place to make this a totally seamless switch. Commercial products like Endnote do not support this sort of (quite useful) functionality.

Note: The cite-key ,
<cite:biblioref cite:key="urn:isbn:0814712827#154"/>
in the above example, provides a link to the reference data stored in the biblio-data.xml file in the save package. There is a document comparing the current save-package format to the proposed format, and a complete document example. The changes to the document schema need to be supported by the document save and load modules. These are detailed in the Further References below.

Details of proposed Bibliographic API Enhancements are a on separate wiki page Bibliographic_API_Enhancements.

First Citation and Ibid. Flags

The new citation API must be able to set a "first" first and subsequent occurrence flag and for the the first occurence of a reference, as well as a 'repeated, adjacent citations' flag (ibid). These flags would be passed on to the formatting engine so that, if they are required by the style selected by the user, first and subsequent formatted citations, or 'repeated, adjacent citations' options such as Ibid, could be returned to Writer.

Note: Setting the first and subsequent occurrence flag needs to work with the footnote counting options options set in Tools-->Footnotes-->TAB=Footnotes:'Footnote Counting', which are: Per Document, Per Page and Per Chapter. (Consideration needs to be given the range permitted in 'repeated, adjacent citations' test. You probably should not start a new chapter with an Ibid. citation, it can even be annoying across page boundaries, when you have to turn the page back to check the citation. So perhaps the range should always be 'Per Page').

Method: The current API of the Writer provides an interface XTextRangeCompare. With this interface two text ranges (citations are also text ranges) can be sorted. This interface is implemented at each text part (body text, header, footer, footnote areas, text frames, table cells, draw text).

Footnote Style Citations

The content of the citation field body should allow foot/endnotes in order to support footnote style citations. This should not be difficult to build as it would be utilising the existing footnote machinery.

Bibliographic Reference Data

Currently the Writer saves a complete copy of the bibliographic reference metadata associated with a citation, with each citation. We propose to separate the citation and the reference data, by leaving just the citation details in the document save file and place the detailed reference metadata in a separate bibliographic data file the OOo save file package. The task is to complete the design of the reference metadata file and add support for it in the OOo save file package.

The relevant component is "interface XComponentLoader" which supports loadComponentFromURL and storeAsURL.

Note: the OASIS OpenDocument TC is currently discussing plans to enhance metadata support in the file format by using an extensible RDF approach. It is our hope that this will offer support sufficient for this project's needs, so that the process of designing the bibliographic data representation noted above will largely consist of simply using standardized OpenDocument metadata.

Data Source API

Given the importance of integrating with remote databases such as library catalogs and third-party bibliographic database applications and services, it is essential OOo provide a standard API accessible via UNO. ZOOM provides just this, and the YAZ client code from Index Data is an excellent and liberally licensed open source tookit [7].

ZOOM provides an easy way to support SRU/W as the standard method for OOo retrieving bibliographic data from any source. The user would just select a local or remote source and the same access mechanism would be used. SRU is particularly promising because while it shares the same model as the SOAP-based SRW, it is expressed in an easier to implement RESTful protocol.

The task, then, is to wrap the YAZ client code in a UNO interface.

The modules that may need to be modified are: Bibliography,textfield/Bibliography,FieldMaster/Bibliography,BibliographyDataField.

Formating Engine

To enhance flexibility and make enhancement easier, it is important that formatting be modular. In Word 2007, Microsoft is adding citation support very similar in structure to what we propose here, and using XSLT to do the formatting. The formatting process just transforms the embedded XML source data, and then passes it to the citation and bibliographic fields.

CiteProc is a working proof-of-concept for the formatting functionality we propose to offer in OOo, and has already been used to format demanding citations and references for a published book. It was originally authored in XSLT 2.0.

There are two options, then. One option discussed on the XML dev list is to bundle Saxon as the default XSLT processor in OOo, which would allow the existing CiteProc implementation to be used more-or-less as is, or for other solutions to be easily swapped in (as Microsoft allows).

The other option is to simply port CiteProc to another language more suitable for integration with OOo. Because formatting is configured with a simple dedicated XML language, this is not hard to do, and there are already versions under way in Ruby, Python and Javascript, the first two of which are currently on a Subversion server:

svn list https://xbiblio.svn.sourceforge.net/svnroot/xbiblio
citeproc-py/
citeproc-rb/
csl-schema/
.
So if, for example, you want to checkout the python directory and start 
playing with that (there's not much there!), you can just do:
.
svn co https://xbiblio.svn.sourceforge.net/svnroot/xbiblio/citeproc-py

2nd Stage Bibliographic Facility Redevelopment

Summary

The second stage is focused on adding backward and forward compatibility support, integration with remote servers, and user interface improvements.

Details

Graphical User Interface (GUI)

This stage will involve designing a building a GUI to offer:

  • Basic citation insertion
  • Basic bibliographic data entry
  • Citation and bibliographic table formating using Citeproc.
  • Basic Bibliographic database access
  • Basic bibliographic internet search and database storage.
  • Automatic generation of references from an OOo-document or parts thereof, similar to enhanced bookmarking by using the document properties. --MadBoP 10:56, 15 April 2006 (CEST)

Backwards and Forwards Compatibility

An important object of Bibliographic Enhancement project is to maintain document file backwards compatibility with older versions of OpenOffice. To achieve this when Bibliographic Entries are inserted into a Document they are stored with the same format as is currently the case. A new bibliographic entry tag will be added with the enhanced citation functions, each citation will contain a key that will point to the bibliographic data which will be saved in the document save package. To preserve backwards compatability we will need to also maintain the old bibliographic citation and data storage in the document. Older version of OpenOffice, without the bibliographic enhancements, in the OOo 2.X .ods format, will read the old format of the bibliographic citations and ignore the bibliographic data file in the save package. A suggested approach is illustrated in a flowchart.

When a major revision of the save package format is introduced the support of the older bibliographic representations can be dropped form the document save file.

Further References

First, see a list of the Functional Requirements of the OpenOffice Bibliographic Module and the draft Bibliographic API Enhancements which provide details of our development plans and basic information for potential developers.

For an overview of the Bibliographic project's major components and components see the context diagram. There is information about the current OpenOffice Bibliographic implementation. There is an example of bibliographic data in a Writer XML save file

For a description of the Citation Style language (CSL) schema see csl.odt or csl-schema.pdf.

A start has been made to the Specification for this work (see the Projects Specifications folder on the Documents and Files page). Also see a attempt at an analysis of the proposed Bibliographic enhancement components and their relationships. The best place to start for finding out about development in OpenOffice is the OpenOffice.org For Developers page. An important resource is the Developer's guide which is part of the SDK (software development kit). Also see advice on writing specification documents.

The OOo API is based on UNO (Universal Network Objects)is the interface-based component model of OpenOffice.org. UNO offers interpretability between different programming languages, different object models, different machine architectures and different processes; either in a local network or even via the Internet. UNO components can be implemented in and accessed from any programming language for which a UNO language binding exists. We currently provide several language bindings for UNO which allows to use the API from Java, C++, OpenOffice.org Basic, Python and Common Language Infrastructure (CLI).

The writer UNO interface is typically implemented in sw/source/core/unocore. That is a useful directory to see how UNO stuff gets mapped to it's core writer implementation. e.g. SW_SERVICE_INDEX_BIBLIOGRAPHY is a bibliography related service. The Bibliography table (reference table) is processed like other indexes such as 'Table of Contents', Table of Illustrations' etc. These index functions are handled by the code module unoidx.cxx. Basic text functions are handled in the code module unotext.cxx.

There is also a demonstration client program for the YAZ toolkit (C & C++). - IRTCL that can perform the reference searches. (Requires YAZ and Tcl/Tk libraries be installed). It does everything but save or export the results ! However it is good model of how to use the toolkit and could be used as the basis for or model of a prototype internet searching facility. Screen pic, screen pic2.

A demonstration internet searching facility that writes selected bibliographic records back to the OOo bibliographic database has been written in Python - PyOOBib, instructions are available. Various problems with OOo Python have lead to us concluding that YAZ in C++ would be a better foundation than the Python code.

There is description of the OOo save-file XML Package, and is a FAQ about it. Also an example showing the proposed bibliographic enhancements.

For details about GUI interface design please look at our Project Documentation; GUI Design Documents' Folder

How to get started

Access to the source code for this project is available for download via CVS. A child work space has been created for us called "metabib" which contains a copy of the xmloff (OpenOffice.org XML File Format Definition) and sw (the word processor application component and the WYSIWYG HTML editor component) code. The down load size will be about 1GB(?). And you will need about 2GB of disk space to compile the metabib CWS (Child-Work-Space). ( Web access to CWS ). If you can not handle that size download then ask us about sending it to you on cdroms. Administration process - you first need to sign the JCA and then obtain the ssh key. After that we will show you how you can access the 'CWS'. It's basically a CVS branch. The most complicated thing is the setup of your tools, such that you can participate in the OOo development --- but, when you have got the ssh key we will show you. See OpenOffice.org For Developers for general development information.

Sample Code

  • Sample python code that reads and outputs some of the fields of the records in the bibliographic database. biblioacess.py
  • Sample OpenOffice Basic program to write records to the bibliographic database bibwrite.html
  • Henrik Just's LaTeX and BibTeX export filter
  • A Perl script which can import Bibliographic database records from PubMed by a list of PMIDs. It writes directly to the biblio database file (not through Openoffice) pickupfrompubmed.pl

Applications which interact with Openoffice - Bibus (WxPython) and B3 (Java).

  • A Perl module OpenOffice::OODoc provides a simple way to access document elements in the (closed i.e. not interactive with OOo) document save file. It provides a lot of various accessors, including a few ones allowing bibliographic field creation, retrieval, or update. Look at bibliographyEntryContent(), getBibliographyElements(), setBibliographyMark() in the OpenOffice::OODoc::Text manual page.
  • A Perl script, CiteProxy, acts as an SRU gateway that CiteProc can use to fetch MODS data from defined sources using known identifiers. It's intended to run as a service on a web server: at the moment it can handle info:pmid, urn:isbn, urn:issn and unalog identifiers.

Applications for importing/exporting different bibliographic formats.

  • A python script for importing RIS format reference(s), RISmport.py. Possibly of some value as it hashes out some RIS details on mapping between fields, and suggests "sensitive" mapping for different reference types.
  • An OpenOffice bibliographic database RIS export program in python - OOoRISExport.py

Contacts

Question or comments can be put to the Bibliographic Project development list dev@bibliographic.openoffice.org or to the project co-leader David Wilson.

Personal tools