Bibliographic/Citeproc Writer Interaction

From Apache OpenOffice Wiki
< Bibliographic
Revision as of 23:32, 9 December 2007 by Dnw (Talk | contribs)

Jump to: navigation, search


Proposal

Citeproc will interact with the new Writer API service BibliographicCitation each time a user adds a citation. When a citation is added, Citeproc will be called to generate the following formatted Citation text strings for a given style and language-

  • initial full citation (in-text or footnote depending upon the style selected)
  • subsequent shortened citation
  • Ibid text
  • Ibid & location text
  • author name
  • year

These will be stored in the constants group BibliographyDataField.

When a user moves, adds or deletes a citation, Writer only needs to pick the appropriate stored citation string (initial, subsequent or Ibid) from the constants group BibliographyDataField to display the appropriate ciation text, and not calling CiteProc again. When the user changes the document style, Citeproc would be called to regenerate all the BibliographyDataField citation strings in one pass.

Author name and year are included as part of the mechanism to implement the 'Exclude Author's Name' and 'Exclude Year' options in the Insert Citation dialogue. These options are needed when you have referred to the author's name in the text, as in

Gao Xingjian in his novel 'sole mountain' (1990) explores identity and myth in China.

and you do not want the author's name repeated in the citation field.


Bibliography / reference table generation would be be done by passing the list of citation IDs to Citeproc and returning the pre-formatted reference table entry text strings.

Diagram

wp-bib-functions3.png

Benefits

It would remove Citeproc from the basic processing flow of text rendering,so there would be saving in processing time and, hence screen update time after a change, but how significant this would be I do not know. The worst case would be a large document where you move a block of text from the back to front of the document and several hundred subsequent and initial references needed to be regenerated.

It would isolate the Writer text rendering from Citeproc which would make it easier and much safer to play with Citeproc prototyping and to plug in alternative formatting engines, as there would be a simpler interaction with Writer, as opposed to a dynamic situation of needing to call the formatting engine regenerate a citation in order complete page rendering. Under the proposed scheme Citeproc is called -

  • on the loading of a reference, probably on the insertion of the first citation to that reference.
  • when the style is changed.
  • to generate or update the Reference Table

Issues

Locator Formatting

Locator formatting covers things like page numbers in documents, track or time in music.

If CiteProc is handling the locator formatting I would guess that intend that the RDF in package would look like:

<b:Book rdf:about="urn:isbn:34982376:123-128">  # With an entry for each ref/location 
 <b:citation>Doe, 1999:123-128</b:citation>
 <b:shortCitaton>1999:123-128</b:shortCitaiton>
   ...
</b:Book>

If the Word Processing interface handles Locator formatting then the WP interface needs to deal with the different locator types formats that are style dependent. I.E. P, page, pp, pages, 123-128, or 123-8 etc.

For the Word Processing interface to handle field suppression like Suppress Author, Date etc. based on the above input it would need to rely on a set of assumptions about the field structure - like the Author text string consists of the characters to the first numeric character or the end of the input, Dates are the first 4 numeric digits following a comma ? This could get complex as we need to cope with all possible styles, multiple authors and different data formats.

It was issues like these that led me to think that Bibliography Service API would need to be supplied with-

<b:Book rdf:about="urn:isbn:34982376":123-128>   #An entry for each ref/location
  <b:citation>Doe, 1999:123-128</b:citation>      #Also Initial Citation
  <b:shortCitaton>1999:123-128</b:shortCitaiton>  # Also Subsequent Citation
  <b:CitationLocation>:123-128</b:CitationLocation>
  <b:CitationAuthorName>Doe</b:CitationAuthorName>
  <b:CitationDate>1999</b:CitationDate>
  <b:IbidText>Ibid.</b:IbidText>
   ...
</b:Book>

Thus suppressing date or author, or locator becomes a simple string matching action, rather than work from some assumptions about the field structure. Also we can construct 'Ibid. with location' with IbidText + CitationLocation, as in

Ibid., 123-128

Shifting locator formatting to the WP interface simplifies the CiteProc requirements to providing

<b:Book rdf:about="urn:isbn:34982376">
   <b:citation>Doe, 1999</b:citation>          #Also Initial Citation
   <b:shortCitaton>1999</b:shortCitaiton> # Also Subsequent Citation
   <b:CitationAuthorName>Doe</b:CitationAuthorName>
   <b:CitationDate>1999</b:CitationDate>
   <b:IbidText>Ibid.</b:IbidText>
   ...
</b:Book>

at the cost of building a mini CiteProc for locator formatting into the WP interface or having a separate (new) requestLocatorFormatting service for the Bibliography Service API for CiteProc to perform whenever a new or change location is added: I think the latter may be the best option.

Personal tools