Difference between revisions of "Bibliographic/Citeproc Writer Interaction"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Proposal)
m (Locator Formatting)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Background==
 
  
I ,[mailto:dnw@openoffice.org David Wilson], initially envisaged Citeproc interacting with Writer each time a user added a citation. That is - A user adds a new reference, Writer requests Citeproc to return the Initial Citation String for that reference. The user adds a second citation to the document and Write requests Citeproc to return the Subsequent Citation String for that reference. etc.
 
  
 
==Proposal==
 
==Proposal==
  
However CiteProc interaction with Writer can greatly simplified if Citeproc is only called when a new reference is added to the document, the Bibliography is generated or the style is changed.
+
Citeproc will interact with the new Writer API service [[Bibliographic_API_Enhancements#Service_BibliographicCitation | BibliographicCitation]] each time a user adds a citation. When a citation is added, Citeproc will be called to generate the following formatted Citation text strings for a given style and language-
 
+
Each reference will have its data stored in the biblio-data.xml file in the save package. I propose that when a reference is added, Citeproc be called to generate the full range of formatted Citation text strings for a given style, with possibilities such as:
+
  
 
* initial full citation (in-text or footnote depending upon the style selected)
 
* initial full citation (in-text or footnote depending upon the style selected)
 
* subsequent shortened citation  
 
* subsequent shortened citation  
* reference table entry
 
 
* Ibid text
 
* Ibid text
 +
* Ibid & location text
 
* author name
 
* author name
 
* year
 
* year
  
The last two are suggested as part of the mechanism to implement the 'Exclude Author's Name' and 'Exclude Year' options in the Insert Citation dialogue. These options are needed when you have referred to the author's name in the text, as in  
+
These will be stored in the  [http://wiki.services.openoffice.org/wiki/Bibliographic_API_Enhancements#constants_group_BibliographyDataField constants group BibliographyDataField].
 +
 
 +
When a user moves, adds or deletes a citation, Writer only needs to pick the appropriate stored citation string (initial, subsequent or Ibid) from the [http://wiki.services.openoffice.org/wiki/Bibliographic_API_Enhancements#constants_group_BibliographyDataField constants group BibliographyDataField] to display the appropriate ciation text, and not calling CiteProc again. When the user changes the document style, Citeproc would be called to regenerate all the BibliographyDataField citation strings in one pass.
 +
 
 +
Author name and year are included as part of the mechanism to implement the 'Exclude Author's Name' and 'Exclude Year' options in the Insert Citation dialogue. These options are needed when you have referred to the author's name in the text, as in  
  
 
  Gao Xingjian in his novel 'sole mountain' (1990) explores identity and myth in China.
 
  Gao Xingjian in his novel 'sole mountain' (1990) explores identity and myth in China.
Line 22: Line 22:
 
and you do not want the author's name repeated in the citation field.
 
and you do not want the author's name repeated in the citation field.
  
The formatted citation strings could be contained in an internal reference list object. When a user adds a citation, Writer only needs to pick the appropriate stored citation string from the reference list. When the user changes the document style, Citeproc would be called to regenerate all the citation strings in one pass, then the standard update fields function would update the citation text strings in the document by picking them up again from the biblio-data.xml file. The repeated citation to the same reference, needing the 'Ibid' text would be detected by Writer display citation API.
 
  
Bibliography / reference table generation would be be done by passing the list of citation IDs to Citeproc and returning the pre-formatted reference table entry text strings. http://bibliographic.openoffice.org/wp-bib-functions2.png
+
Bibliography / reference table generation would be be done by passing the list of citation IDs to Citeproc and returning the pre-formatted reference table entry text strings.  
 +
 
 +
===Diagram===
 +
 
 +
http://bibliographic.openoffice.org/wp-bib-functions3.png
  
 
==Benefits==
 
==Benefits==
I see two possible benefits -
 
  
 
It would remove Citeproc from the basic processing flow of text rendering,so there would be saving in processing time and, hence screen update time after a change, but how significant this would be I do not know. The worst case would be a large document where you move a block of text from the back to front of the document and several hundred subsequent and initial references needed to be regenerated.
 
It would remove Citeproc from the basic processing flow of text rendering,so there would be saving in processing time and, hence screen update time after a change, but how significant this would be I do not know. The worst case would be a large document where you move a block of text from the back to front of the document and several hundred subsequent and initial references needed to be regenerated.
  
It would isolate the Writer text rendering from Citeproc which would make it easier and much safer to play with Citeproc prototyping and to plug in alternative formatting engines, as there would be a simpler interaction with Writer, as opposed to a dynamic situation of needing to call the formatting engine regenerate a citation in order complete page rendering.
+
It would isolate the Writer text rendering from Citeproc which would make it easier and much safer to play with Citeproc prototyping and to plug in alternative formatting engines, as there would be a simpler interaction with Writer, as opposed to a dynamic situation of needing to call the formatting engine regenerate a citation in order complete page rendering. Under the proposed scheme Citeproc is called -
 
+
The main difference is that under the initial scheme Citeproc is called  -
+
 
+
* on the insertion of each citation
+
* when a citation is moved and -
+
** the move changes it from initial and subsequent positions
+
** the reference is now repeated requiring a 'Ibid' tag.
+
* when the style is changed.
+
* to generate the Reference Table
+
 
+
Under the proposed scheme Citeproc is called -
+
  
 
* on the loading of a reference, probably on the insertion of the first  citation to that reference.
 
* on the loading of a reference, probably on the insertion of the first  citation to that reference.
 
* when the style is changed.
 
* when the style is changed.
 
* to generate or update the Reference Table
 
* to generate or update the Reference Table
 +
 +
==Issues==
 +
 +
===Locator Formatting===
 +
Locator formatting covers things like page numbers in documents, track or time in music.
 +
 +
If CiteProc is handling the locator formatting I would guess that intend that the RDF in package would look like:
 +
<code>
 +
<b:Book rdf:about="urn:isbn:34982376:123-128">  # With an entry for each ref/location
 +
  <b:citation>Doe, 1999:123-128</b:citation>
 +
  <b:shortCitaton>1999:123-128</b:shortCitaiton>
 +
    ...
 +
</b:Book>
 +
</code>
 +
 +
If the Word Processing interface handles Locator formatting then the WP interface needs to deal with the different locator types formats that are style dependent. I.E. P, page, pp, pages, 123-128, or 123-8 etc.
 +
 +
For the Word Processing interface to handle field suppression like Suppress Author, Date etc. based on the above input it would need to rely on a set of assumptions about the field structure - like the Author text string consists of the characters to the first numeric character or the end of the input,  Dates are the first 4 numeric digits following a comma ? This could get complex as we need to cope with all possible styles, multiple authors and different data formats.
 +
 +
It was issues like these that led me to think that [[Bibliographic/Developer Page/Services API | Bibliography Service API]] would need to be supplied with-
 +
 +
<code>
 +
<b:Book rdf:about="urn:isbn:34982376":123-128>  #An entry for each ref/location
 +
  <b:citation>Doe, 1999:123-128</b:citation>      #Also Initial Citation
 +
  <b:shortCitaton>1999:123-128</b:shortCitaiton>  # Also Subsequent Citation
 +
  <b:CitationLocation>:123-128</b:CitationLocation>
 +
  <b:CitationAuthorName>Doe</b:CitationAuthorName>
 +
  <b:CitationDate>1999</b:CitationDate>
 +
  <b:IbidText>Ibid.</b:IbidText>
 +
    ...
 +
</b:Book>
 +
</code>
 +
 +
Thus suppressing date or author, or locator becomes a simple string matching action, rather than work from some assumptions about the field structure. Also we can construct 'Ibid. with location' with IbidText + CitationLocation, as in
 +
 +
Ibid., 123-128
 +
 +
Shifting locator formatting to the WP interface simplifies the CiteProc requirements to providing
 +
 +
<code>
 +
<b:Book rdf:about="urn:isbn:34982376">
 +
    <b:citation>Doe, 1999</b:citation>          #Also Initial Citation
 +
    <b:shortCitaton>1999</b:shortCitaiton> # Also Subsequent Citation
 +
    <b:CitationAuthorName>Doe</b:CitationAuthorName>
 +
    <b:CitationDate>1999</b:CitationDate>
 +
    <b:IbidText>Ibid.</b:IbidText>
 +
    ...
 +
</b:Book>
 +
</code>
 +
 +
at the cost of building a mini CiteProc for locator formatting into the WP interface or having a separate (new) requestLocatorFormatting service for the [[Bibliographic/Developer Page/Services API | Bibliography Service API]] for CiteProc to perform whenever a new or change location is added: I think the latter may be the best option.
  
 
[[Category:Bibliographic]]
 
[[Category:Bibliographic]]

Latest revision as of 23:33, 9 December 2007


Proposal

Citeproc will interact with the new Writer API service BibliographicCitation each time a user adds a citation. When a citation is added, Citeproc will be called to generate the following formatted Citation text strings for a given style and language-

  • initial full citation (in-text or footnote depending upon the style selected)
  • subsequent shortened citation
  • Ibid text
  • Ibid & location text
  • author name
  • year

These will be stored in the constants group BibliographyDataField.

When a user moves, adds or deletes a citation, Writer only needs to pick the appropriate stored citation string (initial, subsequent or Ibid) from the constants group BibliographyDataField to display the appropriate ciation text, and not calling CiteProc again. When the user changes the document style, Citeproc would be called to regenerate all the BibliographyDataField citation strings in one pass.

Author name and year are included as part of the mechanism to implement the 'Exclude Author's Name' and 'Exclude Year' options in the Insert Citation dialogue. These options are needed when you have referred to the author's name in the text, as in

Gao Xingjian in his novel 'sole mountain' (1990) explores identity and myth in China.

and you do not want the author's name repeated in the citation field.


Bibliography / reference table generation would be be done by passing the list of citation IDs to Citeproc and returning the pre-formatted reference table entry text strings.

Diagram

wp-bib-functions3.png

Benefits

It would remove Citeproc from the basic processing flow of text rendering,so there would be saving in processing time and, hence screen update time after a change, but how significant this would be I do not know. The worst case would be a large document where you move a block of text from the back to front of the document and several hundred subsequent and initial references needed to be regenerated.

It would isolate the Writer text rendering from Citeproc which would make it easier and much safer to play with Citeproc prototyping and to plug in alternative formatting engines, as there would be a simpler interaction with Writer, as opposed to a dynamic situation of needing to call the formatting engine regenerate a citation in order complete page rendering. Under the proposed scheme Citeproc is called -

  • on the loading of a reference, probably on the insertion of the first citation to that reference.
  • when the style is changed.
  • to generate or update the Reference Table

Issues

Locator Formatting

Locator formatting covers things like page numbers in documents, track or time in music.

If CiteProc is handling the locator formatting I would guess that intend that the RDF in package would look like:

<b:Book rdf:about="urn:isbn:34982376:123-128">  # With an entry for each ref/location 
 <b:citation>Doe, 1999:123-128</b:citation>
 <b:shortCitaton>1999:123-128</b:shortCitaiton>
   ...
</b:Book>

If the Word Processing interface handles Locator formatting then the WP interface needs to deal with the different locator types formats that are style dependent. I.E. P, page, pp, pages, 123-128, or 123-8 etc.

For the Word Processing interface to handle field suppression like Suppress Author, Date etc. based on the above input it would need to rely on a set of assumptions about the field structure - like the Author text string consists of the characters to the first numeric character or the end of the input, Dates are the first 4 numeric digits following a comma ? This could get complex as we need to cope with all possible styles, multiple authors and different data formats.

It was issues like these that led me to think that Bibliography Service API would need to be supplied with-

<b:Book rdf:about="urn:isbn:34982376":123-128>   #An entry for each ref/location
  <b:citation>Doe, 1999:123-128</b:citation>      #Also Initial Citation
  <b:shortCitaton>1999:123-128</b:shortCitaiton>  # Also Subsequent Citation
  <b:CitationLocation>:123-128</b:CitationLocation>
  <b:CitationAuthorName>Doe</b:CitationAuthorName>
  <b:CitationDate>1999</b:CitationDate>
  <b:IbidText>Ibid.</b:IbidText>
   ...
</b:Book>

Thus suppressing date or author, or locator becomes a simple string matching action, rather than work from some assumptions about the field structure. Also we can construct 'Ibid. with location' with IbidText + CitationLocation, as in

Ibid., 123-128

Shifting locator formatting to the WP interface simplifies the CiteProc requirements to providing

<b:Book rdf:about="urn:isbn:34982376">
   <b:citation>Doe, 1999</b:citation>          #Also Initial Citation
   <b:shortCitaton>1999</b:shortCitaiton> # Also Subsequent Citation
   <b:CitationAuthorName>Doe</b:CitationAuthorName>
   <b:CitationDate>1999</b:CitationDate>
   <b:IbidText>Ibid.</b:IbidText>
   ...
</b:Book>

at the cost of building a mini CiteProc for locator formatting into the WP interface or having a separate (new) requestLocatorFormatting service for the Bibliography Service API for CiteProc to perform whenever a new or change location is added: I think the latter may be the best option.

Personal tools