Editing Text Documents

From Apache OpenOffice Wiki
Jump to: navigation, search


The previous section has already discussed a whole range of options for editing text documents, focusing on the com.sun.star.text.TextPortion and com.sun.star.text.Paragraph services, which grant access to paragraph portions as well as paragraphs. These services are appropriate for applications in which the content of a text is to be edited in one pass through a loop. However, this is not sufficient for many problems. Apache OpenOffice provides the com.sun.star.text.TextCursor service for more complicated tasks, including navigating backward within a document or navigating based on sentences and words rather than TextPortions.

The TextCursor

A TextCursor in the Apache OpenOffice API is comparable with the visible cursor used in a Apache OpenOffice document. It marks a certain point within a text document and can be navigated in various directions through the use of commands. The TextCursor objects available in Apache OpenOffice Basic should not, however, be confused with the visible cursor. These are two very different things.

Documentation note.png VBA : Terminology differs from that used in VBA: In terms of scope of function, the Range object from VBA can be compared with the TextCursor object in Apache OpenOffice and not — as the name possibly suggests — with the Range object in Apache OpenOffice.


The TextCursor object in Apache OpenOffice, for example, provides methods for navigating and changing text which are included in the Range object in VBA (for example, MoveStart, MoveEnd, InsertBefore, InsertAfter). The corresponding counterparts of the TextCursor object in Apache OpenOffice are described in the following sections.

Navigating within a Text

The TextCursor object in Apache OpenOffice Basic acts independently from the visible cursor in a text document. A program-controlled position change of a TextCursor object has no impact whatsoever on the visible cursor. Several TextCursor objects can even be opened for the same document and used in various positions, which are independent of one another.

A TextCursor object is created using the createTextCursor call:

Dim Doc As Object
Dim Cursor As Object
 
Doc = ThisComponent
Cursor = Doc.Text.createTextCursor()

The Cursor object created in this way supports the com.sun.star.text.TextCursor service, which in turn provides a whole range of methods for navigating within text documents. The following example first moves the TextCursor ten characters to the left and then three characters to the right:

Cursor.goLeft(10, False)
Cursor.goRight(3, False)</source> 
 
A <tt>TextCursor</tt> can highlight a complete area. This can be compared with highlighting a point in the text using the mouse. The <tt>False</tt> parameter in the previous function call specifies whether the area passed over with the cursor movement is highlighted. For example, the <tt>TextCursor</tt> in the following example 
 
<source lang="oobas">
Cursor.goRight(10, False)
Cursor.goLeft(3, True)

first moves ten characters to the right without highlighting, and then moves back three characters and highlights this. The area highlighted by the TextCursor therefore begins after the seventh character in the text and ends after the tenth character.

Here are the central methods that the com.sun.star.text.TextCursor service provides for navigation:

goLeft (Count, Expand)
jumps Count characters to the left.
goRight (Count, Expand)
jumps Count characters to the right.
gotoStart (Expand)
jumps to the start of the text document.
gotoEnd (Expand)
jumps to the end of the text document.
gotoRange (TextRange, Expand)
jumps to the specified TextRange-Object.
gotoStartOfWord (Expand)
jumps to the start of the current word.
gotoEndOfWord (Expand)
jumps to the end of the current word.
gotoNextWord (Expand)
jumps to the start of the next word.
gotoPreviousWord (Expand)
jumps to the start of the previous word.
isStartOfWord ()
returns True if the TextCursor is at the start of a word.
isEndOfWord ()
returns True if the TextCursor is at the end of a word.
gotoStartOfSentence (Expand)
jumps to the start of the current sentence.
gotoEndOfSentence (Expand)
jumps to the end of the current sentence.
gotoNextSentence (Expand)
jumps to the start of the next sentence.
gotoPreviousSentence (Expand)
jumps to the start of the previous sentence.
isStartOfSentence ()
returns True if the TextCursor is at the start of a sentence.
isEndOfSentence ()
returns True if the TextCursor is at the end of a sentence.
gotoStartOfParagraph (Expand)
jumps to the start of the current paragraph.
gotoEndOfParagraph (Expand)
jumps to the end of the current paragraph.
gotoNextParagraph (Expand)
jumps to the start of the next paragraph.
gotoPreviousParagraph (Expand)
jumps to the start of the previous paragraph.
isStartOfParagraph ()
returns True if the TextCursor is at the start of a paragraph.
isEndOfParagraph ()
returns True if the TextCursor is at the end of a paragraph.

The text is divided into sentences on the basis of sentence symbols. Periods are, for example, interpreted as symbols indicating the end of sentences. (In English, at least, they must be followed by a space, tab, or return for this to work.)

The Expand parameter is a Boolean value which specifies whether the area passed over during navigation is to be highlighted. All navigation methods furthermore return a Boolean parameter which specifies whether the navigation was successful or whether the action was terminated for lack of text.

The following is a list of several methods for editing highlighted areas using a TextCursor and which also support the com.sun.star.text.TextCursor service:

collapseToStart ()
resets the highlighting and positions the TextCursor at the start of the previously highlighted area.
collapseToEnd ()
resets the highlighting and positions the TextCursor at the end of the previously highlighted area.
isCollapsed ()
returns True if the TextCursor does not cover any highlighting at present.

Formatting Text with TextCursor

The com.sun.star.text.TextCursor service supports all the character and paragraph properties that were presented at the start of this chapter.

The following example shows how these can be used in conjunction with a TextCursor. It passes through a complete document and formats the first word of every sentence in bold type.

Dim Doc As Object   
Dim Cursor As Object
Dim Proceed As Boolean
 
Doc = ThisComponent
Cursor = Doc.Text.createTextCursor
 
Do 
  Cursor.gotoEndOfWord(True)
  Cursor.CharWeight = com.sun.star.awt.FontWeight.BOLD
  Proceed = Cursor.gotoNextSentence(False)
  Cursor.gotoNextWord(False)
Loop While Proceed

The example first creates a document object for the text that has just been opened. Then it iterates through the entire text, sentence by sentence, and highlights each of the first words and formats this in bold.

Retrieving and Modifying Text Contents

If a TextCursor contains a highlighted area, then this text is available by means of the String property of the TextCursor object. The following example uses the String property to display the first words of a sentence in a message box:

Dim Doc As Object   
Dim Cursor As Object
Dim Proceed As Boolean
 
Doc = ThisComponent
Cursor = Doc.Text.createTextCursor
 
Do 
  Cursor.gotoEndOfWord(True)
  MsgBox Cursor.String
  Proceed = Cursor.gotoNextSentence(False)
  Cursor.gotoNextWord(False)
Loop While Proceed

The first word of each sentence can be modified in the same way using the String property:

Dim Doc As Object   
Dim Cursor As Object
Dim Proceed As Boolean
 
Doc = ThisComponent
Cursor = Doc.Text.createTextCursor
 
Do 
  Cursor.gotoEndOfWord(True)
  Cursor.String = "Ups"
  Proceed = Cursor.gotoNextSentence(False)
  Cursor.gotoNextWord(False)
Loop While Proceed

If the TextCursor contains a highlighted area, an assignment to the String property replaces this with the new text. If there is no highlighted area, the text is inserted at the present TextCursor position.

Inserting Control Codes

In some situations, it is not the actual text of a document, but rather its structure that needs modifying. Apache OpenOffice provides control codes for this purpose. These are inserted in the text and influence its structure. The control codes are defined in the com.sun.star.text.ControlCharacter group of constants. The following control codes are available in Apache OpenOffice:

PARAGRAPH_BREAK
paragraph break.
LINE_BREAK
line break within a paragraph.
SOFT_HYPHEN
possible point for syllabification.
HARD_HYPHEN
obligatory point for syllabification.
HARD_SPACE
protected space that is not spread out or compressed in justified text.

To insert the control codes, you need not only the cursor but also the associated text document objects. The following example inserts a paragraph after the 20th character of a text:

Dim Doc As Object   
Dim Cursor As Object
Dim Proceed As Boolean
 
Doc = ThisComponent
Cursor = Doc.Text.createTextCursor
Cursor.goRight(20, False)
Doc.Text.insertControlCharacter(Cursor, _
    com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)

The False parameter in the call of the insertControlCharacter method ensures that the area currently highlighted by the TextCursor remains after the insert operation. If the True parameter is passed here, then insertControlCharacter replaces the current text.

Searching for Text Portions

In many instances, it is the case that a text is to be searched for a particular term and the corresponding point needs to be edited. All Apache OpenOffice documents provide a special interface for this purpose, and this interface always functions in accordance with the same principle: Before a search process, what is commonly referred to as a SearchDescriptor must first be created. This defines what Apache OpenOffice searches for in a document. A SearchDescriptor is an object which supports the com.sun.star.util. SearchDescriptor service and can be created by means of the createSearchDescriptor method of a document:

Dim SearchDesc As Object
SearchDesc = Doc.createSearchDescriptor

Once the SearchDescriptor has been created, it receives the text to be searched for:

SearchDesc.searchString="any text"

In terms of its function, the SearchDescriptor is best compared with the search dialog from Apache OpenOffice. In a similar way to the search window, the settings needed for a search can be set in the SearchDescriptor object.

The properties are provided by the com.sun.star.util.SearchDescriptor service:

SearchBackwards (Boolean)
searches through the text backward rather than forward.
SearchCaseSensitive (Boolean)
takes uppercase and lowercase characters into consideration during the search.
SearchRegularExpression (Boolean)
treats the search expression like a regular expression.
SearchStyles (Boolean)
searches through the text for the specified paragraph template.
SearchWords (Boolean)
only searches for complete words.

The Apache OpenOffice SearchSimilarity (or “fuzzy match”) function is also available in Apache OpenOffice Basic. With this function, Apache OpenOffice searches for an expression that may be similar to but not exactly the same as the search expression. The number of additional, deleted and modified characters for these expressions can be defined individually. Here are the associated properties of the com.sun.star.util.SearchDescriptor service:

SearchSimilarity (Boolean)
performs a similarity search.
SearchSimilarityAdd (Short)
number of characters which may be added for a similarity search.
SearchSimilarityExchange (Short)
number of characters which may be replaced as part of a similarity search.
SearchSimilarityRemove (Short)
number of characters which may be removed as part of a similarity search.
SearchSimilarityRelax (Boolean)
takes all deviation rules into consideration at the same time for the search expression.

Once the SearchDescriptor has been prepared as requested, it can be applied to the text document. The Apache OpenOffice documents provide the findFirst and findNext methods for this purpose:

Found = Doc.findFirst (SearchDesc)
 
Do Until IsNull(Found)
  ' Edit search results...
  Found = Doc.findNext( Found.End, SearchDesc)
Loop

The example finds all matches in a loop and returns a TextRange object, which refers to the found text passage.

Example: Similarity Search

This example shows how a text can be searched for the word "turnover" and the results formatted in bold type. A similarity search is used so that not only the word “turnover”, but also the plural form "turnovers" and declinations such as "turnover's" are found. The found expressions differ by up to two letters from the search expression:

Dim SearchDesc As Object
Dim Doc As Object
 
Doc = ThisComponent
SearchDesc = Doc.createSearchDescriptor
SearchDesc.SearchString="turnover"
SearchDesc.SearchSimilarity = True
SearchDesc.SearchSimilarityAdd = 2
SearchDesc.SearchSimilarityExchange = 2
SearchDesc.SearchSimilarityRemove = 2
SearchDesc.SearchSimilarityRelax = False
Found = Doc.findFirst (SearchDesc)
 
Do Until IsNull(Found)
  Found.CharWeight = com.sun.star.awt.FontWeight.BOLD
  Found = Doc.findNext( Found.End, SearchDesc)
Loop
Documentation note.png VBA : The basic idea of search and replace in Apache OpenOffice is comparable to that used in VBA. Both interfaces provide you with an object, through which the properties for searching and replacing can be defined. This object is then applied to the required text area in order to perform the action. Whereas the responsible auxiliary object in VBA can be reached through the Find property of the Range object, in Apache OpenOffice Basic it is created by the createSearchDescriptor or createReplaceDescriptor call of the document object. Even the search properties and methods available differ.


As in the old API from Apache OpenOffice, searching and replacing text in the new API is also performed using the document object. Whereas previously there was an object called SearchSettings especially for defining the search options, in the new object searches are now performed using a SearchDescriptor or ReplaceDescriptor object for automatically replacing text. These objects cover not only the options, but also the current search text and, if necessary, the associated text replacement. The descriptor objects are created using the document object, completed in accordance with the relevant requests, and then transferred back to the document object as parameters for the search methods.

Replacing Text Portions

Just as with the search function, the replacement function from Apache OpenOffice is also available in Apache OpenOffice Basic. The two functions are handled identically. A special object which records the parameters for the process is also first needed for a replacement process. It is called a ReplaceDescriptor and supports the com.sun.star.util.ReplaceDescriptor service. All the properties of the SearchDescriptor described in the previous paragraph are also supported by ReplaceDescriptor. For example, during a replacement process, case sensitivity can also be activated and deactivated, and similarity searches can be performed.

The following example demonstrates the use of ReplaceDescriptors for a search within a Apache OpenOffice document.

Dim I As Long
Dim Doc As Object
Dim Replace As Object
Dim BritishWords(5) As String
Dim USWords(5) As String
 
BritishWords() = Array("colour", "neighbour", "centre", "behaviour", _
   "metre", "through")
USWords() = Array("color", "neighbor", "center", "behavior", _
   "meter", "thru")
 
Doc = ThisComponent
Replace = Doc.createReplaceDescriptor
 
For I = 0 To 5
  Replace.SearchString = BritishWords(I)
  Replace.ReplaceString = USWords(I)
  Doc.replaceAll(Replace)
Next I

The expressions for searching and replacing are set using the SearchString and ReplaceString properties of the ReplaceDescriptors. The actual replacement process is finally implemented using the replaceAll method of the document object, which replaces all occurrences of the search expression.

Example: searching and replacing text with regular expressions

The replacement function of Apache OpenOffice is particularly effective when used in conjunction with regular expressions. These provide the option of defining a variable search expression with placeholders and special characters rather than a fixed value.

The regular expressions supported by Apache OpenOffice are described in detail in the online help section for Apache OpenOffice. Here are a few examples:

  • A period within a search expression stands for any character. The search expression sh.rt therefore can stand for both shirt and short.
  • The character ^ marks the start of a paragraph. All occurrences of the name Peter that are at the start of a paragraph can therefore be found using the search expression ^Peter.
  • The character $ marks a paragraph end. All occurrences of the name Peter that are at the end of a paragraph can therefore be found using the search expression Peter$.
  • A * indicates that the preceding character may be repeated any number of times. It can be combined with the period as a placeholder for any character. The temper.*e expression, for example, can stand for the expressions temperance and temperature.

The following example shows how all empty lines in a text document can be removed with the help of the regular expression ^$ :

Dim Doc As Object
Dim Replace As Object
Dim I As Long
 
Doc = ThisComponent
Replace = Doc.createReplaceDescriptor
Replace.SearchRegularExpression = True
Replace.SearchString = "^$"
Replace.ReplaceString = ""
 
Doc.replaceAll(Replace)

You also might want to have a look at the Alternative dialog Find & Replace for Writer (AltSearch) extension which has extended options.


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools