Difference between revisions of "Grammar Checking API"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Grammar checking process and API)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Grammar checking process and API =
+
{{Writer Project|Category=Writer/API}}
Grammar checking is seen as a particular implementation of a text iteration and markup process, other iteration/markup processes like spell checking or smart tagging basically can work in the same way (though currently they are not implemented like this). If grammar checking is mentioned in the following documentation this can be seen as a placeholder for the more general task of text markup.
+
 
+
===== Involved objects =====
+
* one or more documents to be checked
+
* one or more grammar checker implementations, each supporting at least one language.
+
* one or more grammar check dialogs (at most one per document)
+
* one context menu when clicking on text marked as incorrect
+
* a global grammar checking iterator (common to all documents) implemented as singleton, checking one sentence (of an arbitrary document) at a time.
+
* one thread object per grammar checker that is used to perform checking in the background
+
* objects iterating through the text of a document, one object representing a single grammar checking task that was requested
+
* objects representing text blocks in a text document (“flat paragraphs”) that abstract from the concrete structure of the document and provide access to the text by simple text strings and integer values describing positions and lengths of sub string.
+
 
+
===== Required tasks: =====
+
* Automatic grammar checking
+
* Interactive grammar checking via dialog
+
* Interactive grammar checking via context menu
+
 
+
 
===== Overview of the basic interfaces required =====
 
===== Overview of the basic interfaces required =====
# ''XFlatParagraph implemented by a FlatParagraph object (FP)''Hint for the Writer developer: this is to be implemented by a new class SwXFlatParagraph which holds a simple (weak/smart) pointer to a SwTxtNode - the Writer's implementation of a text paragraph. Especially the implementation object must not be the SwXParagraph since it uses SwUnoCursors and deleting a paragraph will just have the cursor point to the next paragraph instead while deleting the content of an FP shall still remain its reference to the now deleted paragraph. FP objects should be small wrappers, each of them created individually for a single iteration. If the same “real” paragraph is part of two parallel text markup processes there can be two different FP objects. The interface gives access to the "flat" text of a paragraph (that is the content of fields will be included) by providing it as a simple string. All operations that need to specify sub-strings will use position and length parameters.Besides giving access to the string and allowing some simple manipulations of the text or it's language attributes, this object specifically has two methods:<br/>  
+
# ''XFlatParagraph implemented by a FlatParagraph object (FP)'' <br/> FP objects should be small wrappers, each of them created individually for a single iteration. If the same “real” paragraph is part of two parallel text markup processes there can be two different FP objects. The interface gives access to the "flat" text of a paragraph (means e.g. that the content of fields will be included) by providing it as a simple string. All operations that need to specify sub-strings will use position and length parameters.Besides giving access to the string and allowing some simple manipulations of the text or it's language attributes, this object specifically has two methods:<br/>  
 
#* is''Checked(css.text.TextMarkupType)'' that will yield true if the FP object has been marked as checked by the grammar checker (in case the TextMarkupType is GRAMMAR);
 
#* is''Checked(css.text.TextMarkupType)'' that will yield true if the FP object has been marked as checked by the grammar checker (in case the TextMarkupType is GRAMMAR);
 
#* is''Modified()'' to indicate that its content has been changed or deleted since the creation of the object. If an FP is modified, the results of grammar checking for this specific paragraph have to be discarded and the paragraph needs to be processed again. If an FP is marked as “checked” it shall be skipped in further checkings.Finally this interface will allow to place (and remove) visual markings that in case of grammar checking mark incorrect text parts, other markups of course can have different meanings. This text markup will be based on indexing the string belonging to the FP.<br/> Please note that when we talk about paragraphs in the following text that, unless otherwise stated, it will always be an FP. This may not necessarily be a paragraph as in the documents context, it can be a collection of them (e.g. a list) and it not only contains the flow text but also other text content like text frames, headers and footers etc. As only the document core can handle such FP objects efficiently this is a document specific implementation.
 
#* is''Modified()'' to indicate that its content has been changed or deleted since the creation of the object. If an FP is modified, the results of grammar checking for this specific paragraph have to be discarded and the paragraph needs to be processed again. If an FP is marked as “checked” it shall be skipped in further checkings.Finally this interface will allow to place (and remove) visual markings that in case of grammar checking mark incorrect text parts, other markups of course can have different meanings. This text markup will be based on indexing the string belonging to the FP.<br/> Please note that when we talk about paragraphs in the following text that, unless otherwise stated, it will always be an FP. This may not necessarily be a paragraph as in the documents context, it can be a collection of them (e.g. a list) and it not only contains the flow text but also other text content like text frames, headers and footers etc. As only the document core can handle such FP objects efficiently this is a document specific implementation.
# ''XFlatParagraphIterator implemented by the model of the document to check''As this is a document type specific implementation (only the document core can know how to create and access FP objects in the most efficient way), objects implementing this interface have to be retrieved by a provider interface of the document model. Its implementation also usually will be bound to a specific implementation of the FP object. The FPIterator will know where to start the iteration: interactive checking starts at the current cursor position, background processing starts at the beginning of the document (with special consideration of the visual area, see below). So the iterator needs to know whether it is used for interactive or background checking. It also needs to know whether it is used for grammar checking or another text markup iteration because it needs that information to detect the next (non-checked) paragraph (see below).The most important method is ''getNextParagraph'' and is to return an ''XFlatParagraph'' interface to the next paragraph to be checked. Returning an empty reference means there is nothing left to be checked for now. An FPIterator object will keep track of the “current” FP object internally so that it knows how to create the “next paragraph”. The order of the iteration should probably be in reading order but is entirely left to the implementation. Thus especially the following will be allowed:  
+
# ''XFlatParagraphIterator implemented by the model of the document to check'' <br/> As this is a document type specific implementation (only the document core can know how to create and access FP objects in the most efficient way), objects implementing this interface have to be retrieved by a provider interface of the document model. Its implementation also usually will be bound to a specific implementation of the FP object. The FPIterator will know where to start the iteration: interactive checking starts at the current cursor position, background processing starts at the beginning of the document (with special consideration of the visual area, see below). So the iterator needs to know whether it is used for interactive or background checking. It also needs to know whether it is used for grammar checking or another text markup iteration because it needs that information to detect the next (non-checked) paragraph (see below).The most important method is ''getNextPara'' and is to return an ''XFlatParagraph'' interface to the next paragraph to be checked. Returning an empty reference means there is nothing left to be checked for now. An FPIterator object will keep track of the “current” FP object internally so that it knows how to create the “next paragraph”. The order of the iteration should probably be in reading order but is entirely left to the implementation. Thus especially the following will be allowed:  
 
#* The iteration shall skip paragraphs that have been already checked.
 
#* The iteration shall skip paragraphs that have been already checked.
 
#* The iteration may end prematurely, for example if automatic grammar checking was meanwhile disabled. Besides that the client of the FPiterator may want to terminate the iteration by releasing the FPiterator object.
 
#* The iteration may end prematurely, for example if automatic grammar checking was meanwhile disabled. Besides that the client of the FPiterator may want to terminate the iteration by releasing the FPiterator object.
 
#* A full iteration will automatically wrap-around at the end of the document and continue from the beginning until no more invalid FP objects are found.
 
#* A full iteration will automatically wrap-around at the end of the document and continue from the beginning until no more invalid FP objects are found.
 
#* Theoretically, for automatic grammar checking, it would also be OK to iterate more than once over the same paragraph, e.g. if it was modified while being checked.
 
#* Theoretically, for automatic grammar checking, it would also be OK to iterate more than once over the same paragraph, e.g. if it was modified while being checked.
#* As in general users expect the visible paragraphs to be checked first, a possible implementation of getNextParagraph is as follows:
+
#* As in general users expect the visible paragraphs to be checked first, a possible implementation of getNextPara is as follows:
 
#** get the first visible FP; if it is not already checked return it
 
#** get the first visible FP; if it is not already checked return it
 
#** proceed accordingly with all other visible FPs
 
#** proceed accordingly with all other visible FPs
 
#** check the FP following the current one and return it of not checked
 
#** check the FP following the current one and return it of not checked
 
#** proceed accordingly with all other FP until the starting FP is reached
 
#** proceed accordingly with all other FP until the starting FP is reached
# ''XGrammarChecker implemented by all grammar checker components''<br/> The grammar checker is always provided with the text of the whole paragraph. If it has needs to do so it may check all the (previous) text in the paragraph but it must only report errors within the bound of the current sentences. It is required to return all errors in that sentence at once (since this is considered to be the best for the user). Keeping this interface as simple as possible basically makes it possible to wrap a C interface behind this API. This allows the integration of a grammar checkers without using UNO.
+
# ''XGrammarChecker implemented by all grammar checker components''<br/> The grammar checker is always provided with the text of the whole paragraph. If it has needs to do so it may check all the (previous) text in the paragraph but it must only report errors within the bound of the current sentences. If it return all errors in that sentence at once (since this is considered to be the best for the user) or only the first one is left to the implementation for the time being. Keeping this interface as simple as possible basically makes it possible to wrap a C interface behind this API. This allows easier integration of a grammar checkers without using UNO.
# ''XGrammarCheckingIterator implemented by the service css.text.GrammarCheckerIterator''<br/> The object implementing this interface is the mediator between the grammar checkers and the document (which both should not know about each other). Especially it provides the grammar checking dialog and the context menu with the required data and interfaces to change the text. Decoupling of text block access and checking makes it possible to avoid access to the document directly from the grammar checker and so makes the implementation of multi-threaded access much easier.  
+
# ''XGrammarCheckingIterator implemented by the service css.linguistic2.GrammarCheckingIterator''<br/> The object implementing this interface is the mediator between the grammar checkers and the document (which both should not know about each other). Especially it provides the grammar checking dialog and the context menu with the required data and interfaces to change the text. Decoupling of text block access and checking makes it possible to avoid access to the document directly from the grammar checker and so makes the implementation of multi-threaded access much easier.  
 
# ''XGrammarCheckingResultListener''<br/> This interface provides a call-back function that is used by the ''GrammarCheckingIterator'' to provide the specific client with the result of the grammar checking and have it act accordingly (mark wrong parts, fill the context menu or have the dialog show the new sentence with it's errors and corrections).
 
# ''XGrammarCheckingResultListener''<br/> This interface provides a call-back function that is used by the ''GrammarCheckingIterator'' to provide the specific client with the result of the grammar checking and have it act accordingly (mark wrong parts, fill the context menu or have the dialog show the new sentence with it's errors and corrections).
# XGrammarCheckingListener<br/> Callback interface where the GCIterator can tell interested clients when the checking on a particular document (or request) is done. Alternatively an “isDocumentChecking(DocumentID)” can be used. This method will return “true” if the queue described below does not contain any entries for the given document.
+
# ''XGrammarCheckingResultBroadcaster''<br/> This interface allows a interested client to register as listener and thus getting informed about the grammar checking results.
 
+
===== Sample process of automatic grammar checking =====
+
The document will get access to the ''GrammarCheckingIterator'' and requests checking the document by providing:
+
 
+
* a unique interface to the document (to be used to identify this document); as this interface is for identification purposes only perhaps css.uno.XInterface is the appropriate type. If any other type is used it should be considered that this type will set a precondition that “documents” must fulfill that want to use the grammar checking API. Besides that the only precondition is that the “document” must implement the css.text.XFlatParagraphIteratorProvider interface and the objects returned from it.
+
* an FPiterator object that has been initialized (internally) with the FP object where checking should start (usually the first FP of the document)
+
* the FP object also should contain the starting position of the first sentence (for automatic checking this should be always 0)
+
* a flag indicating whether this request is for automatic checking only and thus no suggestions are required and no dialog must be displayed.
+
* a reference to a ''XGrammarCheckingResultListener'' interface that stores the returned markup information until it can be handed over to the FPIterator.
+
 
+
 
+
The ''GrammarCheckingIterator'' maintains a queue of sentences to be processed. When called with the above arguments it creates an entry consisting of those four values and adds them add the end of the queue.
+
 
+
For the sake of simplicity for now let's assume there is only one document to be processed. In reality the queue may contain elements for several documents and the GCIterator will process the entries belonging to the same document one after another, always halting the thread executing them after an entry has been checked and restarting it with the new entry once the checked one has been processed by the FPIterator.
+
 
+
Thus (since there are no further API calls) the ''GrammarCheckingIterator'' will dequeue the first element from the queue (which is the one we just added). It retrieves the text of the paragraph, checks the BreakIterator for a suggested end-of-sentence position (that is indicated by it's starting position) and, after identifying the languages to use, calls all the respective grammar checker(s) asynchronously (background checking) or synchronously (interactive checking, but that can be discussed) one-by-one to check that single sentence. Asynchronity is implemented by creating a thread object for each used grammar checker component and executing all grammar checking steps in this thread. The thread will be provided with the current queue entry only, it will not access the queue itself.
+
 
+
The GCIterator will return immediately after creation of the thread object. Results are received in the callback method of an XGrammarCheckerResultListener interface provided by the GCIterator (preferably implemented as an individual object). The GCIterator will make sure that dispatching of received results or text markup will happen in the “main” thread of the document.
+
 
+
Please note that all the asynchronity we require to have for background grammar checking is implemented in the ''GrammarCheckingIterator'' only, and each grammar checker implementation should run in the same thread that makes life easier for the grammar checker component but still provides a sufficient amount of parallelism.
+
 
+
For the results returned by a grammar checker we first check if the ''XFlatParagraph'' is not modified (this flag will be set if the FP has been changed or deleted since it was returned by the iterator). If so we remove all previous outdated markings for this sentence and then mark all the incorrect text parts. Otherwise we discard the results silently. (Remark: is it really necessary to remove old markings explicitly?)
+
 
+
When the last grammar checker result for this sentence has been processed and there is still unprocessed text left in the paragraph the ''GrammarCheckingIterator'' will continue with the new starting position by updating the queue entry and proceed with it.
+
 
+
If the paragraph has been checked completely this way then the ''getNextParagraph'' function from the ''XFlatParagraphIterator'' interface is called to retrieve the next paragraph to be checked. If there is one found we start anew as described above with the new paragraph. The whole iteration will be continued until all paragraphs have been marked as checked.
+
 
+
Each time a queue entry has been processed the GCIterator checks whether there is another entry for the same grammar checking component and continues with it by putting it up for processing in the thread assigned to the particular grammar checker. So not only the document but also the queue is accessed in the main thread.
+
 
+
===== Sample process of interactive grammar checking =====
+
There are two basic differences when comparing interactive grammar checking with automatic checking:
+
 
+
* the results of grammar checking a sentence need to be interactively post-processed by the user.
+
* each grammar checker is allowed to make use of it's own implementation of a grammar checking dialog and another dialog to view and modify implementation specific options as well. The 'options dialog' should have two entry points: one accessible from a tool-bar, and the other one would be a button in the grammar checking dialog. If the grammar checker features only an option dialog but not a grammar checker dialog the office internal dialog must be able to start that option dialog. (See questions and problem section as well!)
+
* due to some grammar checkers requiring the text of previous sentences in the paragraph to be known in order to determine if the current one is correct one can not just simply check one sentence after another if a change is applied.If for example the first two sentences are without error and the third sentence got corrected by the user we can't simply proceed to the fourth sentence. Because it can't be figured out what the specific grammar checker implementation keeps track of it can't be helped but to throw everything away and tell that grammar checker that a new paragraph is to be started. Thus we need to have the grammar checker check the first three sentences (without reporting any error for them) in order to build up the internal data to check the fourth sentence. Only then we can pass the fourth sentence on to the grammar checker and expect the results to be correct. And for all the following sentences of that paragraph we have to do it all over again.One slightly different approach would be that not the iterator has to pass all the previous sentences on to the checker again but instead have it done by the grammar checker itself implicitly if it has need to do so. After all the grammar checker is always given the whole text along with the sentence-start-position. But the grammar checker implementation needs to be aware of that by doing so it may encounter sentences in languages it does not know about and that would usually not have been passed to this specific checker.
+
 
+
Going with the preferred way of having the grammar checker scan previous text implicitly if needs be, interactive checking looks like this:
+
 
+
The document determines the first paragraph to be checked (for example the one where the cursor is displayed). In order to have it a little less complicated when determining if the whole document was processed we probably like to start checking at the beginning of the paragraph and not a specific sentence within even if the cursor is placed e.g. in the last sentence (this can be discussed though).
+
 
+
When the starting paragraph is determined the document accesses the ''GrammarCheckingIterator'' and provides similar data as for automatic checking:
+
 
+
* the unique reference to the document
+
 
+
* an FPiterator object that has been initialized with the FP object where checking should start (usually the paragraph where the cursor is located)
+
 
+
* the start-of-sentence position of the first sentence. Here 0.
+
* and the flag indicating interactive checking now
+
 
+
*
+
* also now a reference to a ''XGrammarCheckingResultListener'' interface, implemented by the dialog, that is used by the ''GrammarCheckingIterator''<nowiki> as call-back to provide the dialog with the text, data and results to be displayed. [Remark: if we do it synchronously we can get the results as a direct return value of the grammar checker. Keeping the API asynchronous would allow us to also do interactive checking in the background.]</nowiki>
+
 
+
 
+
<nowiki>The GCIterator waits for the current background processing step for the selected grammar checker to end (thus blocking the main thread) and creates a new entry for the queue, but now it places that entry at the start of the queue instead at the end. This way interactive checking will take precedence over automatic checking and the latest UI triggered request will be at the top of the queue and gets processed next. [Alternatively, if interactive checking is done in the thread too, the entry is just placed in the queue and the call returns.]</nowiki>
+
 
+
As long as no error is found by the grammar checkers the iteration and the tasks to be done in each iteration are the same as for automatic checking. That is aside from the flag for new queue entries indicating interactive checking and those entries being added at the start of the queue (and most probably not using a thread for doing the check).
+
 
+
For sake of simplicity we stick to only one single grammar checking dialog used by all checkers here in this text!
+
 
+
If one or more of the grammar checkers report an error with the current sentence then the error reports from all the checkers are collected and the grammar checking dialog is started (if not already open, see below) and filled with the necessary data by the ''GrammarCheckingIterator'' (the text and the complete list of errors). The iterator will not wait for the dialog to be finished or to advance to the next sentence, it will continue with it's own tasks (e.g. entering it's main loop and start checking a sentence from another document). The dialog will only show the very sentence the error was found in and has to allow for at least
+
 
+
* showing all the error positions (preferably all at once),
+
* reviewing each errors (displaying the detailed information about that error) and suggestions for corrections,
+
* modifying the sentences text freely,
+
* changing the language of text parts or all the text,
+
* ignoring the errors and continuing with the next sentence,
+
* committing the changes made and continue with checking (as long as the paragraph was not modified or invalidated meanwhile),
+
* if that very paragraph was modified meanwhile there will be a button that allows the dialog to discard the changes (that are not yet applied) and restart checking with the sentence the cursor currently is in (which may be in a completely different paragraph) by adding that to the top of the queue (if anything is left),
+
* and if the paragraph was invalidated (deleted) the changes in the dialog are to be discarded as well and ''getNextParagraph'' should be called to continue checking and (if anything is left) thus adding the next sentence to be checked to the top of the queue,
+
* or canceling the interactive checking and closing the dialog.
+
 
+
If the changes are committed they are applied to the paragraph by using the ''XFlatParagraph'' interface.
+
 
+
Then if there is still text left in the paragraph the next sentence is added at the start of the queue (as described above). If the paragraph was processed the ''getNextParagraph'' function is called to get the next paragraph to be checked, if no such paragraph is found the iteration is finished and the dialog can be closed. Otherwise we continue by putting an entry for interactively checking the first sentence of the new found paragraph at the start of the queue. (Either way the entry needs to have the ''XGrammarCheckingResultListener'' reference set in order to provide the dialog with new data to be displayed when the next sentence with errors was found.)
+
 
+
Then the dialog is left open and the ''GrammarCheckingIterator'' takes control again and can proceed with the next entry from the start of the queue. This way the process continues until the next error is found or the iteration over the document is finished.
+
 
+
If the dialog is closed (either because the iteration has finished or because the cancel button was pressed) the interactive checking is stopped simply by not adding another entry to the queue.
+
 
+
Please note that because the starting point for grammar checking the whole document may vary (be it automatic or interactive) this may result in different errors! For example: In German it is correct to write dolphin either as "Delfin" or as "Delphin". But still one would probably want to enforce consistent use of only one of the two spellings. Thus if a grammar checker likes to enforce this it has internally to keep track what spelling was encountered first and reject the other spelling hence forward.
+
 
+
Side note: The dialog needs to implement the XComponent interface and the ''GrammarCheckingIterator ''needs to be it's listener.
+
 
+
===== Using the context menu with grammar checking =====
+
Opening the context menu by right clicking on a text part that is marked as being incorrect requires yet another approach. The differences here are:
+
 
+
* Only a single sentence should be checked (but still to do this correctly the grammar checker may need to scan all the previous text in the paragraph)
+
* and only those errors/corrections (or part of them if the list gets too long) should be displayed that belong to the respective marked text part. That is only for a subset of all the errors in a sentence the corrections are needed which may leave some room for optimization.
+
 
+
Thus when the right-click takes place the document (when creating the menu which is to be done in the main thread) calls the respective function of the ''GrammarCheckingIterator'' and an entry similar to interactive checking of that very sentence is added to the start of the queue. The only differences will be that there are some additional values in that entry:
+
* one for the starting position of the marked text part, and one for it's length. Thus indicating that the grammar checkers only need to find out errors in that text range and the return value (which usually should hold all errors/corrections for that sentence) needs only to cover that range as well.(On the other hand it would be possible to retrieve all errors and thus behave exactly as interactive checking and just ignore the results that are out of the indicated range.)
+
* a flag needs to indicate that this is for the context menu only (and thus there is no need for a iteration to be started, i.e. no further queue entry will be added implicitly when processing this entry)
+
* also a reference to the ''XGrammarCheckingResultListener'' interface that is used by the ''GrammarCheckingIterator ''to provide the context menu with the results is needed.(Naturally this implementation of the interface is a different one then the one used in the dialog for interactive checking.)
+
 
+
Since the call to the ''GrammarCheckingIterator'' is asynchronously we need to wait a reasonable limited amount of time (e.g. 3 seconds) to receive the results via the call-back. If we do get them in time we can show the context menu as planned. If not, since we can't wait forever, we have to display a fallback menu (either the regular one or one showing an entry like "grammar checking timed out").
+
 
+
Since the context-menu may already be closed (either before the 3 seconds are over or after) when finally the ''GrammarCheckingIterator'' is ready to use the call-back function to provide the results, the context-menu needs to implement the ''XComponent'' interface and the ''GrammarCheckingIterator'' must be it's listener, and it is required to already register as such when the context-menu calls the function to trigger grammar checking for the sentence.
+
 
+
Right before the context-menu gets displayed it should already dispose. This would be necessary later anyway and doing it now should prevent the call-back function from being executed belated if grammar checking was too slow (or did not return at all) and the fallback menu is displayed.
+
 
+
 
+
When everything went fine and the user was able to select a specific correction the ''XFlatParagraph ''interface provided as part of the ''XGrammarCheckingResult'' will be used to make the changes in the text.
+
 
+
===== Checking several documents at the same time and mixing all the above tasks =====
+
===== Other applications of the iterator concept: =====
+
The idea of having a global iterator that iterates over the documents text in using the interface XFlatParagraphIterator and giving access to the a paragraph with the XFlatParagraph interfaces thereby doing "some task" should be applicable as well to the following tasks:
+
 
+
* word count
+
* smart tags
+
* spell checking(?)
+
 
+
The different needs for the iteration order (or even skipping some paragraphs) might be implemented by using specific iterators or else by giving the iteration function a specific context for the iteration. For example:
+
 
+
getNext( eActionContext )
+
 
+
where eActionContext might be one of
+
 
+
CONTEXT_WORD_COUNT,
+
 
+
CONTEXT_SMART_TAGS,
+
 
+
CONTEXT_GRAMMAR_CHECKING
+
 
+
===== Problems and questions currently left open =====
+
===== Grammar checking of mixed language text =====
+
It is believed that even for sentences that uses several languages there is only a single language the whole sentence is in. (How that language is identified is a completely different matter and probably a complex task though!) And thus that sentence should only be grammar checked in that single language. For example:
+
 
+
The German word for television is Fernseher.
+
 
+
This sentence should be grammar checked in English and not German
+
 
+
If possible though (for example if language attributes are set correctly) it should be noted that Fernseher is not in English and thus at the very least no spelling error should for English should be reported for that word. And probably it is also impossible to report any grammar error that involves embedded foreign words. Thus the best to hope for probably is for the foreign word to be recognized as correct by the respective spell checker.
+
 
+
 
+
Even with completely embedded sentence like
+
 
+
In Gallica Caesar said 'Alea iacta est.' and continued his battle.
+
 
+
the above text is in a single language English and not Latin. If an existing grammar checker is smart enough to cope with embedded sentences of a different language I don't know. To keep it simple for the time being the whole text should be grammar checked as one sentence in English and in only that language.
+
 
+
===== Grammar checking and spell checking at the same time =====
+
Should spell checking have an iterator of it's own with a thread of it's own? Or should spell checking be handled by the ''GrammarCheckingIterator'' as well?
+
 
+
===== Other Questions / problems: =====
+
* checking is limited to paragraphs (unless the implementation of ''XFlatParagraph'' chooses to hide sth. more behind it which is unlikely). Though one could think of enumerations as a possible application for this behavior.
+
* in the case of several grammar checkers for one languages, what do we do if they report different end-of-sentence positions? We really can't handle each checker individually here.
+
* does a grammar checker that requires knowledge of the previous text in this paragraph need to have those text presented even if it is in a language it does not know?
+
* How to achieve consistency of usage (e.g. spelling) when having grammar checkers in multiple languages? E.g. e-mail vs. email? Or does it need to be consistent on a per language base only?
+
* How to determine the language of a sentence? Use the language of the first word, or language guessing, or the language with the most words,... ?
+
* Problems related to a specific UI, namely the grammar checking dialog still to be defined, not yet covered.
+
* The troublesome case of having for example three grammar checkers for one language and two of them wanting to use their own dialog while the third will go with the office internal one is left out. Because if all of them report errors in the same sentence and like to use their own dialog as well we will have to cope with switching between three dialogs just to edit a single sentence. That's just plain awful to even think about. And I doubt there will be even one user to appreciate such a scenario.
+
* Should the document (e.g. ''XFlatParagraph'') be in charge to determine the language for checking or should it be the ''GrammarCheckingIterator''? Probably the latter...
+

Latest revision as of 13:51, 28 March 2010

Writer Icon.png

Writer Project

Please view the guidelines
before contributing.

Popular Subcategories:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

Internal Documentation:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

API Documentation:

Ongoing Efforts:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

Sw.OpenOffice.org
Overview of the basic interfaces required
  1. XFlatParagraph implemented by a FlatParagraph object (FP)
    FP objects should be small wrappers, each of them created individually for a single iteration. If the same “real” paragraph is part of two parallel text markup processes there can be two different FP objects. The interface gives access to the "flat" text of a paragraph (means e.g. that the content of fields will be included) by providing it as a simple string. All operations that need to specify sub-strings will use position and length parameters.Besides giving access to the string and allowing some simple manipulations of the text or it's language attributes, this object specifically has two methods:
    • isChecked(css.text.TextMarkupType) that will yield true if the FP object has been marked as checked by the grammar checker (in case the TextMarkupType is GRAMMAR);
    • isModified() to indicate that its content has been changed or deleted since the creation of the object. If an FP is modified, the results of grammar checking for this specific paragraph have to be discarded and the paragraph needs to be processed again. If an FP is marked as “checked” it shall be skipped in further checkings.Finally this interface will allow to place (and remove) visual markings that in case of grammar checking mark incorrect text parts, other markups of course can have different meanings. This text markup will be based on indexing the string belonging to the FP.
      Please note that when we talk about paragraphs in the following text that, unless otherwise stated, it will always be an FP. This may not necessarily be a paragraph as in the documents context, it can be a collection of them (e.g. a list) and it not only contains the flow text but also other text content like text frames, headers and footers etc. As only the document core can handle such FP objects efficiently this is a document specific implementation.
  2. XFlatParagraphIterator implemented by the model of the document to check
    As this is a document type specific implementation (only the document core can know how to create and access FP objects in the most efficient way), objects implementing this interface have to be retrieved by a provider interface of the document model. Its implementation also usually will be bound to a specific implementation of the FP object. The FPIterator will know where to start the iteration: interactive checking starts at the current cursor position, background processing starts at the beginning of the document (with special consideration of the visual area, see below). So the iterator needs to know whether it is used for interactive or background checking. It also needs to know whether it is used for grammar checking or another text markup iteration because it needs that information to detect the next (non-checked) paragraph (see below).The most important method is getNextPara and is to return an XFlatParagraph interface to the next paragraph to be checked. Returning an empty reference means there is nothing left to be checked for now. An FPIterator object will keep track of the “current” FP object internally so that it knows how to create the “next paragraph”. The order of the iteration should probably be in reading order but is entirely left to the implementation. Thus especially the following will be allowed:
    • The iteration shall skip paragraphs that have been already checked.
    • The iteration may end prematurely, for example if automatic grammar checking was meanwhile disabled. Besides that the client of the FPiterator may want to terminate the iteration by releasing the FPiterator object.
    • A full iteration will automatically wrap-around at the end of the document and continue from the beginning until no more invalid FP objects are found.
    • Theoretically, for automatic grammar checking, it would also be OK to iterate more than once over the same paragraph, e.g. if it was modified while being checked.
    • As in general users expect the visible paragraphs to be checked first, a possible implementation of getNextPara is as follows:
      • get the first visible FP; if it is not already checked return it
      • proceed accordingly with all other visible FPs
      • check the FP following the current one and return it of not checked
      • proceed accordingly with all other FP until the starting FP is reached
  3. XGrammarChecker implemented by all grammar checker components
    The grammar checker is always provided with the text of the whole paragraph. If it has needs to do so it may check all the (previous) text in the paragraph but it must only report errors within the bound of the current sentences. If it return all errors in that sentence at once (since this is considered to be the best for the user) or only the first one is left to the implementation for the time being. Keeping this interface as simple as possible basically makes it possible to wrap a C interface behind this API. This allows easier integration of a grammar checkers without using UNO.
  4. XGrammarCheckingIterator implemented by the service css.linguistic2.GrammarCheckingIterator
    The object implementing this interface is the mediator between the grammar checkers and the document (which both should not know about each other). Especially it provides the grammar checking dialog and the context menu with the required data and interfaces to change the text. Decoupling of text block access and checking makes it possible to avoid access to the document directly from the grammar checker and so makes the implementation of multi-threaded access much easier.
  5. XGrammarCheckingResultListener
    This interface provides a call-back function that is used by the GrammarCheckingIterator to provide the specific client with the result of the grammar checking and have it act accordingly (mark wrong parts, fill the context menu or have the dialog show the new sentence with it's errors and corrections).
  6. XGrammarCheckingResultBroadcaster
    This interface allows a interested client to register as listener and thus getting informed about the grammar checking results.
Personal tools