Difference between revisions of "Writer/Text Formatting"

Latest revision as of 10:14, 30 June 2018

Writer Project

Please view the guidelines
before contributing.

Popular Subcategories:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

... more Subcategories

Internal Documentation:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

... more Internal Documentation

API Documentation:

Ongoing Efforts:

Extension:DynamicPageList (DPL), version 2.3.0 : Warning: No results.

... more Writer Efforts

Projects on this Wiki: (edit list)

Sw.OpenOffice.org

View or edit this template.

This page is in a DRAFT stage.

1 Introduction
2 Text Formatting - Main Objectives
3 Prerequisites – Where do we start?
4 Portions - The Basic Concept of Text Formatting
5 Attributes
- 5.1 Attribute Arrays
- 5.2 Attribute Handler and Attribute Stacks
6 Font Objects
- 6.1 The Script Type Data Structure
7 Attribute Iterators
8 Text Information
9 Main Objectives - A closer Look
10 Details on selected Topics
11 Files in the Star Writer Project

Introduction

This paper can only give a rough summary of the text formatting component of StarWriter. Many important concepts and aspects are not yet discussed here, but maybe they will be one day, since this paper is work in progress. I hope this will be useful for somebody.

Text Formatting - Main Objectives

Calculation of Paragraph Sizes

The main objective of the text formatting process is the calculation of paragraph sizes. Depending on the direction of writing (left to right in western countries, top to bottom in the Asian world) a width or height is given from the environment, the other value results from the number and sizes of the lines in the paragraph. For this, line breaks have to be calculated. A paragraph has to be split into several parts, if there is not sufficient space for it. The information obtained from the text formatting process are cached for increasing performance. More on this objective can be found in section 9.1.

Visualization

In addition, the text formatting is responsible for repainting parts of an paragraph, overlapping with a given rectangle. The rectangular areas to be repainted are obtained from the layout. For this, an output device (usually a monitor or printer) is selected. The output device knows about different fonts and how to paint them. A more detailed description of repaint events is given in section 9.2.

Calculation of Character Positions

A difference has to be made between document coordinates and paragraph positions. While document coordinates refer to the more physical aspects of pages, paragraph positions refer to the logical position in the string representing a paragraph. Basically, a paragraph is a string containing unicode characters. The text formatting maintains both values and must be able to convert document coordinates into paragraph positions and vice versa. For example, by clicking the document with a mouse, the appropriate position in the paragraph has to be calculated, enabling the user to modify the paragraph at the chosen position. On the other hand, converting paragraph positions into document coordinates takes place each time the cursor position has to be updated. A combination of both aspects occurs, if the cursorUp (cursorDown) key are pressed. First, the current document position has to be transformed into document coordinates. Next, the y-coordinate is shifted to the line above the current line, and finally the resulting point is transferred back to a paragraph position. This will also be discussed in section 9.3.

Invalidation

As already mentioned above, for performance reasons formatting information are cached. Changing text, modifying attributes, or moving a paragraph within the document result in the invalidity of these information. The formatting information have to be recalculated.

Prerequisites – Where do we start?

Document Coordinates

A paragraph has access to its x- and y- position referring to the physical aspect of a window. (0,0) is the upper left corner of the output window, the default value for the upper left text border on the first page of an document is (1702, 1702). Note that these coordinates do not depend on the current resolution. They are measured in twips, i.e., 1/20 pt.

Width and Height of the Environment

A paragraph has to fit into its environment. If a fixed width is given, a paragraph can make a request for more height, if necessary. If this cannot be granted, the paragraph has to be split.

Character Attributes

Usually, character attributes (e.g., font, color, size...) have start and end values referring to paragraph positions. Each paragraph has an array storing character attributes, which determine, how different words or characters of the paragraph are visualized or printed.

Paragraph Attributes

Paragraph attributes (e.g., margins, line spacing, hyphenation...) are associated with a paragraph. These are the default attributes for the whole paragraph.

Output Device and Font

Formatting information refer to a reference device, usually the installed printer. HTML documents require the current window to be the reference device.

To calculate line breaks, mainly width and height of words are required. For this, font objects can be used. Font objects store a font family, font size, font style, etc. A font object can be passed to an output device, which returns the width and height of a given word.

Break Iterator

A BreakIterator is used to find possible break positions in a given string. The string and the index of the first character not fitting to the line are handed over to the BreakIterator. If hyphenation is disabled, the returned value is the end of the last word fitting to this line. If hyphenation is enabled, the end of the last syllable fitting to the line can be obtained from the result.

Portions - The Basic Concept of Text Formatting

Starting point for text formatting is a SwTxtFrm object, which receives calls when user interventions take place or parts of a document have to be repainted. The SwTxtFrm object initiates the construction of a data structure, which efficiently supports the repaint, reformat, and invalidation process. This data structure mainly consists of so-called text portions and is accessible via an SwParaPortion object.

A text frame has access to an SwTxtNode object, which basically represents the text string (XubString) and the attribute array of this paragraph (illustration 1). Once it has been generated, the SwParaPortion data structure with the formatting information is stored in a cache memory for efficiency reasons.

An SwParaPortion object itself consists of several further text portions. Lines are represented by SwLineLayout objects, each line consists of different derivatives of the SwLinePortion class (illustration 2). Note, that text portions do not store words or characters. They only represent a formatted part of text. For this, they store their width, height, and the maximum ascend of characters, all referring to document coordinates. They also store the number of characters (length) they are representing. Note, that there are portions with width = 0 and length = 1 (e.g., hole portions used to swallow spaces at an end of line) and portions with width ≠ 0 and length = 0 (e.g., portions used to represent notes).

Every portion derives from SwLinePortion, which in turn derives from SwPosSize. The main types of portions and their attributes are introduced in the following sections.

SwPosSize

SwPosSize
USHORT nWidth // portion width in document coordinates
USHORT nHeight // portion height in document coordinates

SwLinePortion

This class represents a basic text portion.

SwLinePortion : public SwPosSize{abstract}
SwLinePortion pPortion // pointer to next portion
xub_StrLen nLineLength // # characters and spaces represented by this portion
USHORT nAscent // maximum ascend

insert(), append() // insert and append other portions
USHORT GetWhichPortion() // portion identification
virtual void Paint(SwTxtPaintInfo) = 0 // paint the portion
virtual sal_Bool Format(SwTxtFormatInfo) // format the portion

SwTxtPortion

These portions represent parts of the paragraph string. They provide functionality for calculating line breaks with respect to the given environment.

class SwTxtPortion : public SwLinePortion
BreakLine(SwTxtFormatInfo, SwTxtGuess) // break a line

SwLineLayout

In addition, this class has an pointer to the next line in the paragraph. It can be regarded as representing one line of text.

SwLineLayout : public SwTxtPortion
SwLineLayout pNext // pointer to next line
USHORT nRealHeight // height of this line including line spacing

SwParaPortion

This portion represents the paragraph text. The SwRepaint and SwCharRange objects are updated, if the appropriate events are triggered.

SwParaPortion: public SwLineLayout
SwRepaint aRepaint // region to repaint
SwCharRange aReformat // paragraph position to reformat

Other Portions

There is a huge amount of other portions, which serve very special purposes. Among these are portions representing

fields (SwExpandPortion)
enumeration (SwNumberPortion)
line breaks (SwBreakPortion)
tabs (SwTabPortion)

and many other... Have a look at their specification in the *.hxx files

Attributes

Within one paragraph it is (of course) possible to apply different formatting attributes to different parts of the text (illustration 3). The next sections explain the handling of attributes.

Attribute Arrays

Each attribute has a start value, pointing to a position in the paragraph string. Usually attributes also have end values, indicating at which position they become invalid. Portion borders are affected by attribute changes, i.e., a change of attributes always requires the beginning of a new portion. Illustration 4 depicts portion borders and attribute changes with respect to the example given in illustration 3. Note that the shown portions can be smaller than in the illustration, but never overlap with an attribute change. Attributes are organized in an attribute array, sorted by their start value.

Attribute Handler and Attribute Stacks

Attributes influence the painting and formatting process. During painting, the portions representing the text to print are traversed, collecting for each portion the attributes being set. These attributes are used to generate a font object, which in turn is responsible for printing the text represented by this portion during output mode. During formatting, the portions are set up and their width, height and number of characters are determined considering the font generated for this portion.

For generating the appropriate font for a given set of attributes, an attribute handler is used. An attribute handler consist of a number of attribute stacks, one stack for each kind of attribute. Reaching the start of an attribute during traversing of the attribute array, the attribute is pushed onto the appropriate stack and the current font is changed according to this attribute. When reaching an end of an attribute in the attribute array, the attribute is pushed from its stack and the remaining top attribute on this stack is used to change the font again. The state of the attribute stack collection during the traversal of the text represents the state of the current font. The attribute stacks are initialized with default attributes, which are specified for the whole paragraph.

Font Objects

Font objects are generated during the traversal of the set of portions representing the paragraph. The current font object is attached to the output device and destroyed when it becomes outdated. The hierarchy of different font classes is shown in illustration 5.

To take Asian or other languages into consideration, an SwFont object consists of three SwSubFonts (Latin, CJK, and CTL^[1]). The SwFont::nActual field indicates the current script, i.e., the currently active subfont.

Font objects are modified by character attributes. The attribute handler (see section 5.2) changes the current font object for each attribute pushed or popped from the stacks.

Font objects are able to calculate the width (in document coordinates) of a given string. This is used for the determination of line breaks and portion sizes.

The Script Type Data Structure

It is possible, to define different fonts for different scripts, e.g., to have a Times New Roman, 12 pt. font for "Latin" characters, while "Asian" characters are shown using an "Andale UT", 20 pt. font. For this, is is essential to know the ranges of the different script types. The SwScriptInfo class is a data structure maintaining these information.

Internally, two arrays are used, one for the next script change, an other for the type of script (Illustration 6).

An SwScriptInfo object is part of each paragraph and has to be updated when entering a new character. Referring to the example in illustration 6, entering a character at position 39 invalidates the array at position 1,2 and 3. A change of script type means necessarily a portion change, since different fonts are used for different scripts types.

Attribute Iterators

There are two associated kinds of objects involved in all processes referring operations on portions (format, paint, cursor positioning):

Attribute Iterators (SwAttrIter)
Text Information (SwTxtInfo)

During one of these processes, the SwTxtFrm object generates an iterator and an info object. Depending on the current action, the iterator can by an SwTxtPaint, an SwTxtFormat, or an SwTxtCursor iterator. These iterators traverse portions of an paragraph, at the same time they search the attribute array for attribute changes. SwTxtInfo objects are used to communicate information between iterators and portions. The SwTxtInfo class is introduced in chapter 8.

These are the most frequently used iterator classes:

SwAttrIter

Base class for all iterators.

SwAttrIter

SwFont pFnt // font object, results from evaluating attributes

SwAttrSet pAttrSet // attribute set

OutputDevice pLastOut

xub_StrLen nStartIndex, nEndIndex // indices to attribute array

xub_StrLen nPos// index to string, last position that has been looked up

void Chg(SwTxtAttr pHt)// push the attribute onto the appropriate stack and changes the font

void Rst(SwTxtAttr pHt) // pop the attribute from its stack

xub_StrLen GetNextAttr( ) // next attribute change position

sal_Bool Seek(xub_StrLen nPos) // changes font member, considering attributes at position nPos

sal_Bool SeekAndChg(xub_StrLen nPos, OutputDevice pOut)// changes font member and changes font at output device according to attributes at position nPos

SwTxtIter

The SwTxtIter class is derived from SwAttrIter. It can be regarded as an iterator with two objectives: iterating over attributes in the attribute array and iterating over lines of a paragraph.

SwTxtIter : public SwAttrIter
xub_StrLen nStart// start position of current line, updated during iteration
SwLineLayout pCurr // current line
SwLineLayout pPrev // previous line

SwLineLayout GetNext() // pCurr->GetNext()
void CharToLine(xub_StrLen) // sets line iterator to first line intersecting a specified position in text string
SwLineLayout TwipsToLine(SwTwips) // sets line iterator to first line intersecting a apecified position in document coordinates
void CalcRealHeight(sal_Bool bNewLine)
void CalcAscentAndHeight(KSHORT rAscent, KSHORT rHeight)

SwTxtCursor

This iterator is used for cursor positioning purposes. It is generated by the text frame in case a repositioning of the cursor is necessary. Have a look at section 9.3 for more details.

SwTxtCursor : public SwTxtIter
sal_Bool GetCharRect(SwRect, xub_StrLen)// converts paragraph position to document coordinates
xub_StrLen GetCrsrOfst(SwPosition pPos, Point rPoint) // converts document coordinates to paragraph position, result in pPos

SwTxtPainter

During a repaint event (of an rectangular area), the text frame generates an SwTxtPainter object. The DrawTxtLine method of the painter is called by the text frame for each line intersecting the repaint rectangle. Within this method the SwTxtPainter redirects the painting task to the portions in the current line by calling their paint methods. For this, the information collected from the attribute array are encapsulated in the appropriate SwTxtInfo object and passed over to the portions. In fact, the actual painting is done by font objects, which are called by the info structures. Have a look at section 9.2 for a more detailed view on these iterator.

SwTxtPainter : public SwTxtCursor
void DrawTxtLine(SwRect) // draws current line

SwTxtFormatter

Each time a reformatting has to be performed (e.g., insertion/deletion of text) the text frame generates an SwTxtFormatter object. The proceeding is similar to the painting process and will be discussed in detail in section 9.1.

SwTxtFormatter : public SwTxtPainter
xub_StrLen FormatLine(xub_StrLen nStart)

Text Information

Different tasks require a different set of information. These are the most frequently used information encapsulating classes:

SwTxtInfo

This is the base class for text information classes.

SwTxtInfo
SwParaPortion pPara // information always refer to a paragraph

SwTxtSizeInfo

TxtSizeInfo objects are able to calculate the width of a given string. This is used for calculating line breaks and the number of characters fitting into a portion. A call of the GetTxtSize method is redirected to the current font object.

SwTxtSizeInfo : public SwTxtInfo
OutputDevice pOut // the output device
SwFont pFnt // a font object
SwTxtFrm pFrm // the text frame
xub_StrLen nIdx, nLen // start index and length of current portion

SwPosSize GetTxtSize(OutputDevice, XubString, xub_StrLen)

SwTxtPaintInfo

Actually, the TxtPaintInfo objects are not only responsible for encapsulating information for the paint process, in fact they are an active part of the paint process. The DrawText method is called by the portions to be painted and it redirects the painting task to the font.

SwTxtPaintInfo : public SwTxtSizeInfo
Point aPos // output position
SwRect aPaintRect // the update rectangle

void DrawText(SwLinePortion, xub_StrLen)

SwTxtFormatInfo

The SwTxtFormatInfo class maintains all important information for the formatting process. Besides from this, it calls the external word hyphenation tool.

SwTxtFormatInfo : public SwTxtPaintInfo
SwLineLayout pRoot // die Root der aktuellen Zeile (pCurr)
SwLinePortion pLast // die letzte Portion
xub_StrLen nLineStart // aktueller Zeilenbeginn im rTxt
USHORT nLeft // left margin
USHORT nRight// right margin
USHORT nFirst //left margin of first line
USHORT nRealWidth // "real" line width
USHORT nWidth // "virtual" line width
USHORT nLineHeight // height after CalcLine
sal_Bool bInterHyph // interactive hyphenation
sal_Bool bAutoHyph // automatic hyphenation
.....
xub_StrLen FormatLine(xub_StrLen nStart)

Main Objectives - A closer Look

For easier understanding, the main objectives of text formatting, as mentioned in chapter 2, can be examined independently, although for example, the text formatting process always involves a repaint event and makes it necessary to calculate a new cursor position. The basic operations initiated by the iterators are shown in illustration 7:

The main sections (9.1, 9.2, 9.3) of this chapter represent the main tasks of the text formatting process.

Text Formatting

Text formatting is one of the main task for a word processor. Formatting information have to correspond to the attributes defined by the user. The following flow roughly shows the actions triggered from an user intervention to the resulting formatting information.

The user inserts two characters by copy and paste into the 75. position into a paragraph.
The text frame is notified by its SwTxtFrm::Modify method.
The invalid range (75-76) is stored in an SwCharRange object within the appropriate SwParaPortion object.
The layout calls the SwTxtFrm::Format method. This method checks, if a reformatting process is necessary due to an invalid range in a paragraph.
An SwTxtFormatter and an SwTxtFormatInfo object are generated.
The SwTxtFormatter::FormatLine method is called for each line, which has to be reformatted.
The SwTxtFormatter::BuildPortions method is called and builds as many portions, as fit to the current line.
A first guess about how many characters are represented by the next portion to be build is the number of characters up to next change of attributes or script type. This is done in the SwTxtFormatter::FormatLine method.
The SwLinePortion::Format method calculates the width and height of the current portion. In case it doesn't fit into the current line, the break iterator returns a suitable line break position. A more detailed description on this is given in section 9.1.1.
The current portion is appended to the portion list of this line, and the information structure is updated according to the new situation. Steps 8 to 10 are repeated, as long as there is sufficient space for more portions in the current line.
Finally, the height and "real" height (considering line spacing) are determined.

Illustration 8 shows the corresponding function calls for this procedure.

The following sections give an short introduction to common tasks to be regarded during text formatting.

Line Breaks and the Break Iterator

In section 9.1 the text formatting process has been discussed. The instance responsible for finding suitable line breaks is the break iterator. Breaks can be text delimiters (like spaces, tabs...) or, in hyphenation mode, a hyphenation possibility. User defined soft hyphens are also provided. The process described in this section is located between the 9. and 10. step from section 9.1.

The functionality of the break iterator is wrapped in the SwTxtGuess class. The operations performed during text formatting are as follows:

For each text portion an SwTxtGuess object is created, which is responsible for the calculation, if the current text portion still fits into the current line. For a portion, which fits into the line, nothing has to be done.
Depending on the current font, the SwTxtGuess object determines, which paragraph position would be the last one to fit into the current line. For this, the output device has to sum up the widths of the characters, comparing the result with the given line width. In hyphenation mode, the possible hyphen character at the end of a line is also considered during this calculation.
If the character at this position is a text delimiter, no line break has to be performed. This has to be eliminated in upcoming versions, because in some cases a text delimiter is not necessarily a possible line break.
Otherwise, the break iterator is called to find a suitable line break position. In hyphenation mode, this is the end of the last syllable fitting to the current line. We have to make sure that soft hyphens defined by the user are also considered as possible line breaks.
The line break is stored within the SwTxtGuess object and is required during the further text formatting process. The portion widths and length have to be adjusted according to these line breaks. For hyphenated words, an additional hyphen portion representing the hyphen (which is of course not part of the paragraph string) has to be generated and added to the end of the line.

Some actions and results occurring during this process are depicted in illustration 9.

Line Break Handling

During text formatting, the SwTxtPortion::Format evaluates the information obtained from an SwTxtGuess object. method. Five different cases are distinguished:

The current portion still fits to the current line.
The current portion does not fit to the current line but a valid hyphenation position has been found within the portion.
The current portion does not fit to the current line but a valid word end has been found within the portion.
The current portion does not fit to the current line and the current portion does not have a valid line break position, but a valid line break position has been found within the current line.
The current portion does not fit to the current line and the current portion and the current line does not have a valid line break position.

The handling of the first three cases is discussed in section 9.1.1. For example, the fifth case has to be handled, if you insert a word which is wider than a line, and you do not allow hyphenation, or the word does not have a hyphenation position. In this situation, a BreakCut is performed, the word is cut at the end of the current line.

The fourth case demonstrates a situation, which requires to break the straight forward formatting direction of the portions. Imagine the following case:

Attribute

What happens, if this word has to be hyphenated, because the "e" doesn't fit to the current line? The above example consists of two text portions with different character attributes. During the formatting process, the first portion is already appended to the current line. Getting the correct hyphenation position ("Attrib-ute") requires to split the last portion and inserting a hyphenation portion between the parts. Because the last portion has to be formatted again, this is called an underflow event. These are the actions performed in the above case:

The SwTxtGuess::Guess method determines the hyphenation position.
The SwTxtPortion::Format method triggers an underflow event, because the hyphenation position is not in or at the beginning of the current portion.
During an underflow event, the SwTxtFrm::BuildPortions method does not try to generate a new portion. It makes the previous portion ("Attribu") the current portion and adjusts the current line width to a value, which is 1 twip smaller than the previous portion would require.
The current portion is formatted with the new line width, forcing the break iterator to calculate a new line break. The values for the current portion (width, length, etc.) are set, and a hyphenation portion (with no lengths but a width for the hyphenation character) appended to the current portion.
The rest of the string ("ute") is formatted in the usual way.

Repaint Events

Repaint events are handled quite similar as format events, apart from the fact, that usually formatting information are already available.

The layout notifies the text frame, that a repaint for certain areas has to take place.
The SwTxtFrm::Paint(SwRect) method is called.
An SwTxtPainter and an SwTxtPainterInfo object are generated.
The SwTxtPainter::DrawTextLine method is called for each line, which has an non empty intersection with the repaint area.
For each portion, which is affected by the repaint event, the appropriate font is generated by examining the attribute array. This font is part of an SwTxtPaintInfo object.
The virtual SwLinePortion::Paint(SwTxtPaintInfo) method is called, having the portions paint themselves by using the underlying rendering engine. For this, the SwTxtPaintInfo object is passed to the portion, communicating the font and output device.

Cursor Positioning

Cursor positioning requires the conversion from paragraph positions to document coordinates and vice versa. The main methods for this conversion are:

SwTxtFrm::GetCharRect(SwRect, SwPosition, SwCrsrMoveState)This methods determines a rectangular area covering the character at a specified paragraph position.
SwTxtFrm::GetCrsrOfst(SwPosition, Point, SwCrsrMoveState)This method is responsible for finding the right position in a paragraph, for example when using the mouse to place the cursor.

For example, a paragraph position is converted into a cursor position by executing the following steps:

The frame containing the paragraph is determined.
An SwTxtCursor and an SwTxtSizeInfo object are constructed.
SwTxtCursor::GetCharRect is called. The line containing the character results from iterating over the lines and considering the number of characters per line.
The y-coordinate results from summing over the heights of the skipped lines.
The portion containing the character is determined by summing over the number of characters of each portion in this line.
The x-coordinate results from summing over the widths of the skipped portions and the widths of the remaining characters up to the required character. For this, the appropriate font object is used to calculate their size.
The widths of the rectangle corresponds to the width of the character.

Both methods are combined, when using the cursorUp (cursorDown) key:

GetCharRect is called with the current position in the paragraph.
The resulting rectangle is "shifted" to the previous line
The new position is converted into a position in the paragraph

Cursor Positions inside Fields

Usually traveling into fields is not allowed. But there is one exception, when it comes to accessibility. For accessibility it is important to obtain the position of each character inside a paragraph. That includes positions inside fields. A field is represented by one special character in the paragraph string. So it must be possible to specify positions inside a field. This is done by using the SwCrsrMoveState structure defined in sw/inc/crstate.hxx. This structure has a pointer to a SwSpecialPos struc:

struct SwSpecialPos
xub_StrLen nCharOfst // the position inside the field
USHORT nLineOfst // this is used for fields which cover more than one line
BYTE nExtendRange // this is used for special positions ( < 0 or > string length )

So if you want to get the position of the second character inside a field, which has the position 5 in the paragraph string, you simply call the GetCharRect function with the SwPosition encapsulating the string position 5 and pass a SwSpecialPos structure with nCharOfst = 2. This also works for fields with follow fields. A follow field is a part of a field, which doesn't have a representation in the paragraph string because the original field has been split into several pieces (e.g., if there are script changes inside the field).

Details on selected Topics

Fly Formatting

Everything you put into your document except from plain text or tables is called a fly. Examples are frames or drawing objects. It is a task of the text formatting, to consider the required space for these things, i.e., we insert fly portions into our lines to indicate that this space is reserved for a fly. Intersections of flys and a given rectangle can be determined by the SwTxtFly::GetFrame(SwRect) method. The algorithm for fly positioning is explained with regard to illustration 10, showing the common case, where the wrap option is set to "parallel", i.e., the text is supposed to float around the fly:

Note, that in our example, a line spacing of "double" is set. Two flys are potentially overlapping the text (denoted by dotted rectangles). These are the steps performed for calculating fly positions during text formatting:

When formatting the second line ("this String"), we make a first guess on its final height by assuming that the whole line has the same font.
We already consider line spacing at this point and "scan" the light gray region, which is supposed to be the final placement of the line for collision with flys. If any collisions are found, we reserve space for the flys by inserting fly portions into the line. In our example, we find fly 1 intersecting the light gray area. This means, the second line has three portions, one text portion representing the string "this", the second portion is a fly portion, reserving space for the fly, the third portion represents the string "string".
The line height is calculated, based on our first assumption, that "this" and "string" are the text portions in this line.
Having calculated this "real" line height, we can make a more precise scan for fly portions within the darker gray area. If flys found during step 2 do not any longer intersect the line or if new flys are found intersecting the line, the line has to be formatted again, starting with the line height calculated in step 3. In our example, fly 1 is now not any longer intersecting the line, while fly 2 does. We proceed with step 2, considering the dark gray area while scanning for flys.

Note, that our example would cause endless loops in the formerly explained algorithm. While executing step 2 with the darker gray area, a collision with fly 2 is detected, resulting in two portions for the line, one text portion "this" and one fly portion. A new calculation of the line height has the same result as our first execution of step 1. For this reason, we break our algorithm, if we would start a new loop with an area to scan lying higher (with respect to the y coordinate of the upper border) than the area of the last scan (for not causing an endless loop). The final result is that "this String" is distributed on two lines.

Kerning Portions

When using different scripts in one document, we offer the feature “Apply spacing between different scripts”. This additional space between two text portions representing text in different scripts is realized by inserting a SwKernPortion between them. You can see this in the SwTxtFormatter::BuildPortions function. In order to cover all situations, we have to be able to append or prepend a kerning portion. Most cases are covered by the code to append the kerning portion, but when dealing with fields, it can be easier prepending the kerning portion in front of the current portion. We do not want to interfere this feature with the “Allow hanging punctuation” feature used for Asian languages, see section 10.3. For this, we only add an additional gap between two characters of a different script, if both of them are different from punctuation characters.

Hanging Punctuation

Hanging Punctuation is used in some Asian languages. Some characters are allowed to overlap the borders, especially punctuation characters. The characters which are allowed to be hanging characters, are defined as “Not at start of line” in the options dialog. In order to have the break iterator recognize Latin characters as possible hanging punctuation characters for Asian languages, we do this: If a new portion does not fit to the current line and its script is different to the one of the last portion, we temporarily change the language passed over to the break iterator to the language set for the last portion. This way, a Latin dot behind a Japanese text portion would become Japanese and the break iterator would return that it is allowed to have the Latin dot outside the boundaries.

Text Output

The WYSIWYG mode is a combination of online and printed layout. The reason for this is that we do not want the result on the screen differ to much from the printed output on one hand, on the outher hand the result on the screen should not totally depend on the selected printer and printer driver. The algorithm for computing output positions for the screen works like this:

We make an first assumption about each character position on the screen, by calling the OutputDevice::GetTextArray function for the selected printer. The result is a so-called kerning array, which contains the positions of the characters of the string relative to the first character.
The width of the current character is calculated by calling OutputDevice::GetCharWidth for the current screen font.
The screen position (nScreenPos) of the next character equals the last screen position plus the width of the current character.
We calculate the output position of the next character by using this formula:outputPosition = ( 3 * nScreenPos + printed position of next character ) / 4
The resulting value is subtracted by the width of the current character and stored in the kerning array as the output position for the current character. We proceed by taking the next character and start to repeat steps 2 – 5.

We have a special treatment for blanks. Finding a blank during step 2, we let the output position for this blank be the position obtained from the printer during step 1. This allow us to have fixed positions in our output to the screen, compared to the printer.

We also consider the previous character during this algorithm. If the last character is a blank, the output position of the current character is its position as it occurs in the kerning array.

The nDelta Member of SwParaPortion

During text formatting, we only want to format lines, which have been changed because of an input event. Making changes to a line can influence the whole paragraph because of the introduction of new line breaks. The paragraph has a member of the type SwCharRange, which represents a range which has been changed due to a user interaction. Simply typing a character would result in a range with the current input position and length = 1. The nDelta member of the SwParaPortion is the sum of all added or deleted characters. This value is considered during a reformat process of a paragraph. A value of -2 means that two characters have been deleted from this paragraph. During formatting of the paragraph, we first skip each line, which does not lie inside the SwCharRange. The new length of each line, which is formatted, is compared to its old length. The difference between these two values is subtracted from the nDelta value.If the new end of a line does not anymore lie inside the reformatting range and nDelta equals 0, we have reached a stable situation, we do not have to reformat any following lines.

Font Caches

Font caches are used for faster construction of fonts and faster access to then.

SwFontCache

The SwFontCache is used for faster construction of a font. The SwFontCache stores SwFontObj objects, which in turn encapsulates an SwFont object. The keys for the cache are attribute sets, the values are the fonts. You need a SwFontAccess object to obtain the appropriate font for a given attribute set. If the SwFontAccess::Get method does not find a font object for the attribute set, it generates a new entry for the cache.

SwFntCache

The SwFntCache is used to find the appropriate output font for a given font set by the user. The font set by the user does not have to be the font used for the output. It is passed over to the printer, which returns the font, which is used for printing. These two fonts (the user font and the printer font) are compared and the result is the font which is best suited for screen output. Since this is an expensive operation, the result of the comparison is cached. The key for the cache is a font, the result is the font for the screen. For faster access to the cached objects, each font in the cache has a magic number, which points to a position in the array of the stored output fonts. You need to have an SwFntAccess object to access the cache. If the key font you want to ask the cache for does not have a magic number, the key font is compared with all the other key fonts in the cache.

Drop Caps

Calculating drop caps can lead to some difficulties with endless loops, quite similar to the situation described in section 10.1. A drop cap portion is build by the SwTxtFormatter::NewDropCapPortion function. A first guess for the height of the drop cap is made by guessing the line height (which has not been calculated yet, because the line still unformatted) and multiplying it with the number of lines the drop cap should cover.

A drop cap portion can consist of several parts, one part for each attribute or script change:

SwDropPortion : public SwTxtPortion
SwDropPortionPart* pPart // several parts due to script / attribute changes
USHORT nLines // number of lines
USHORT nDropHeight
USHORT nDropDescent // distance to next line
KSHORT nDistance // distance to the next portion
short nY // Y offset the baseline for text output

SwDropPortionPart
SwDropPortionPart* pFollow // the next drop portion part
SwFont* pFnt // the font used for output
xub_StrLen nLen // the length of the part
USHORT nWidth // the width of the part

The nLen, pFollow and pFnt fields of the SwDropPortionParts are assigned during the building of the portion.

The widths of the drop cap parts are calculated in the SwDropPortion::Format function. We even want to allow different font sizes inside a drop cap. For this we have to calculate a common scaling factor for all parts of the drop cap parts, in order to:

Let them have the same baseline within the drop cap
Let them have a height, that comes quite close to the drop caps height.

The scaling factor and the common baseline for the drop portion parts are calculated in the SwTxtFormatter::CalcFontSize function (see also illustration 11):

For a first guess for the scaling factor, we devide the wished drop cap height by the biggest font height.
The common scaling factor is applied to all fonts used within the drop cap portion.
We calculate a boundary rectangle for all glyphs of the same part.
The rectangle is shifted to a common baseline, by subtracting the ascent of the font used from the rectangles top.
The union of these shifted rectangles is used to determine the ascent and height of the whole drop cap text. The ascent is the distance from the unions top to the common baseline, the descent id the distance from the common baseline to the bottom of the union.
A new and better scaling factor can be achieved, by dividing the wished height by the height of the union. We continue with step 2 until we get a factor which makes the unions height become quite close to the wished height.
The final scaling factor is applied to the fonts of all drop portion parts and the descent of the final union is stored in the SwDropPortion as nY.

Returning from SwTxtFormatter::CalcFontSize, we can now calculate the widths of the drop portion parts. For each part, the font for the part is set at the SwTxtFormatInfo object, and the text represented by this part is formatted with respect to its font. The sum of the widths is the width of the drop portion.

We continue with the formatting process, until all lines the drop cap should cover are formatted. Then we compare the final size of the lines with the height of

the drop cap portion. If it differs, we have to format the drop cap and the lines once more, the new wished height for the drop cap portion is now the size of the lines. This can lead to endless loops, therefore we only allow the drop portion either to grow of shrink continuously.

Spellchecking

There are two modes for spell checking: Online spell checking and interactive spell checking. The online mode shows red wave lines under wrong words while typing the text. The interactive mode can be accessed by pressing F7.

Online (Auto) Spell Checking

During online mode, a so-called wrong list is built and updated each time the user modifies the text. The wrong list continues words which has been identified as wrong words. The invalidate range for the wrong list is set in the SwTxtFrm::Modify function. This range has to be checked again (SwTxtFrm::_AutoSpell). Each word inside the range is spelled again, wrong words remain in the wrong list, new words are added and others are removed from the wrong list. Also the words for auto completion are collected. Finally a rectangle is returned, which indicates the area of change, in order to have the red wave lines be repainted correctly.

Interactive Spell Checking

The interactive spell checking (SwTxtNode::Spell) works on a given range in the text. You can specify the range by selecting some text, otherwise the whole text is assumed to be checked. The interactive spell checking comes up with a dialog if a wrong word has been found. If we trigger an interactive spell checking while the online mode is enabled, the interactive spell checking only considers the words listed in the wrong list. Otherwise it checks all words in the given range.

Vertical Formatting

Some languages require a vertical formatting of text, e.g., Chinese or Japanese. For them a vertical text formatting and layout has to be integrated. The basic idea for the text formatting is to swap frames, i.e., the width and height of a frame are swapped. The three main tasks of the text formatting (formatting paragraphs, painting of text and cursor travelling) are performed on swapped frames and afterwards the results are translated back.

The advantage is that most of the code inside the text formatting does not have to be adapted to vertical formatting. There are many functions at the SwTxtFrm class to do the conversion from horizontal to vertical formatting. As an example consider the calculation of the cursor position. First the frame which currently contains the cursor is determined. Before calculating the correct screen coordines of the cursor inside the text frame, the frame is swapped. Now we assume that we deal with a common horizontally formatted text frame. The usual functions for calculating the cursor position are called and finally the swapped frame and the resulting rectangle for the cursor position are rotated back.

Asian Grid Mode

In some Asian countries, people use a grid layout for writing. Usually a page has 10 rows and 20 columns.Above (or optional below) the main cells there is a line reserved for ruby characters. An Asian character should snap to the grid, i.e., it is centered inside a cell. An exception are punctuation characters, which are aligned to the right or left inside a cell. The main funtionality is provided inside the txtnode/fntcache.cxx file, especially in the GetTxtSize(), GetCrsrOfst() and DrawText() functions. The GetTxtSize function returns only multiples of a cell width as the widths of some Asian text. The DrawText() function centeres the characters inside their cells, by considering their real width and height.

Western text should be centered inside as many cells as are needed for them. This is achieved by inserting SwKernPortions between Asian and Western portions (SwTxtFormatter::BuildPortions). First a new SwKernPortions with width = 0 is inserted and a pointer to it is stored before formatting a western text segment. When the end of the western text segment is reached, another SwKernPortion is appended at the end of the western text. The number of required grid cells for the western text is determined and the remaining space inside these cells is distributed to the SwKernPortions in front of and behind the western text. Attention: Due to an underflow event the first SwKernPortion could have been deleted.

Alphabetic Index Sorting

Indices for an alphabetic index can be inserted via the Insert – Indexes and Tables – Entry dialog. Some languages require additional information, how the indices should be sorted. Asian languages for example use addition phonetic strings to be sorted. Each entry is represented by an SwTOXIndex object. Illustration 14 shows the class model for the main components of an alphabetic index. The SwTOXInternational class has an Compare function, which compares two SwTOXIndex objects considering language specific rules.

Files in the Star Writer Project

inc/atrhndl.hxx

The attribute handler, which maintains the collection of attribute stacks used to change font objects is specified here (see section 5.2).

class SwAttrHandler

inc/drawfont.hxx

The script type data structure (see section 6.1) is defined here.

class SwScriptInfo
class SwDrawTextInfo

text/inftxt.hxx

This file contains specifications for classes encapsulating information required during the formatting/painting/cursor positioning processes (see chapter 8).

class SwLineInfo
class SwTxtInfo
class SwTxtSizeInfo : public SwTxtInfo
class SwTxtPaintInfo : public SwTxtSizeInfo
class SwTxtFormatInfo : public SwTxtPaintInfo

text/itratr.hxx

The base class of all iterators (see chapter 7):

class SwAttrIter

text/itrform2.hxx

Iterator class for text formatting purposes (see chapter 7 and section 9.1).

class SwTxtFormatter : public SwTxtPainter

text/itrpaint.hxx

Iterator class controlling the painting process (see chapter 7 and section 9.2).

class SwTxtPainter : public SwTxtCursor

text/itrtxt.hxx

Some other iterator classes for more special operations.

class SwTxtIter : public SwAttrIter
class SwTxtMargin : public SwTxtIter
class SwTxtAdjuster : public SwTxtMargin
class SwTxtCursor : public SwTxtAdjuster

text/pordrop.hxx

Special portion used for initials.

class SwDropPortion : public SwTxtPortion

text/porexp.hxx

Expanding portions for fields, blanks, and notes.

class SwExpandPortion : public SwTxtPortion
class SwBlankPortion : public SwExpandPortion
class SwPostItsPortion : public SwExpandPortion

text/porfld.hxx

Different kinds of field portions are defined here.

class SwFldPortion : public SwExpandPortion
class SwHiddenPortion : public SwFldPortion
class SwNumberPortion : public SwFldPortion
class SwBulletPortion : public SwNumberPortion
class SwGrfNumPortion : public SwNumberPortion
class SwCombinedPortion : public SwFldPortion

text/porfly.hxx

Portions used for frames.

class SwFlyPortion : public SwFixPortion
class SwFlyCntPortion : public SwLinePortion

text/porfnt.hxx

Footnote portions and portions required for widows/orphans handling are defined here.

class SwFtnPortion : public SwExpandPortion
class SwFtnNumPortion : public SwNumberPortion
class SwQuoVadisPortion : public SwFldPortion
class SwErgoSumPortion : public SwFldPortion

text/porhyph.hxx

Portions introduced during hyphenation are defined in the file.

class SwHyphPortion : public SwExpandPortion
class SwHyphStrPortion : public SwHyphPortion
class SwSoftHyphPortion : public SwHyphPortion
class SwSoftHyphStrPortion : public SwHyphStrPortion

text/porlay.hxx

General text structuring portions, see also chapter 4.

class SwLineLayout : public SwTxtPortion
class SwParaPortion : public SwLineLayout

text/porlin.hxx

Abstract base class for all portions.

class SwLinePortion: public SwPosSize

text/pormulti.hxx

Portions used for multi line style.

class SwMultiPortion : public SwLinePortion
class SwDoubleLinePortion : public SwMultiPortion
class SwRubyPortion : public SwMultiPortion
class SwRotatedPortion : public SwMultiPortion

text/porref.hxx

References are represented by this portions.

class SwRefPortion : public SwTxtPortion
class SwIsoRefPortion : public SwRefPortion

text/portab.hxx

Portions used for different types of tabulators.

class SwTabPortion : public SwFixPortion
class SwTabLeftPortion : public SwTabPortion
class SwTabRightPortion : public SwTabPortion
class SwTabCenterPortion : public SwTabPortion
class SwTabDecimalPortion : public SwTabPortion

text/portox.hxx

These portions are used for tables of contents.

class SwToxPortion : public SwTxtPortion
class SwIsoToxPortion : public SwToxPortion

text/portxt.hxx

Most frequently simple text portions are used.

class SwTxtPortion : public SwLinePortion
class SwHolePortion : public SwLinePortion

text/possize.hxx

Base class of all portions, stores width and height of an portion (in document coordinates).

class SwPosSize

↑ Complex Text Layout: Term for languages whose writing system needs complex transformations in order to visualize the text stored in memory. Examples are bidirectional scripts like Arabic or Hebrew, languages using clustered characters like Thai or languages with characters, whose visual representation depends on their context (e.g., ligatures).

[ftn0-1] Complex Text Layout: Term for languages whose writing system needs complex transformations in order to visualize the text stored in memory. Examples are bidirectional scripts like Arabic or Hebrew, languages using clustered characters like Thai or languages with characters, whose visual representation depends on their context (e.g., ligatures).

[1]

@@ Line 1: / Line 1: @@
 {{Writer Project|Category=Writer/CoreDoc}}
-{{Documentation/DraftPage}}
+{{DraftPage|EN}}
 __TOC__

Difference between revisions of "Writer/Text Formatting"

Latest revision as of 10:14, 30 June 2018

Contents

Introduction

Text Formatting - Main Objectives

Calculation of Paragraph Sizes

Visualization

Calculation of Character Positions

Invalidation

Prerequisites – Where do we start?

Document Coordinates

Width and Height of the Environment

Character Attributes

Paragraph Attributes

Output Device and Font

Break Iterator

Portions - The Basic Concept of Text Formatting

SwPosSize

SwLinePortion

SwTxtPortion

SwLineLayout

SwParaPortion

Other Portions

Attributes

Attribute Arrays

Attribute Handler and Attribute Stacks

Font Objects

The Script Type Data Structure

Attribute Iterators

SwAttrIter

SwTxtIter

SwTxtCursor

SwTxtPainter

SwTxtFormatter

Text Information

SwTxtInfo

SwTxtSizeInfo

SwTxtPaintInfo

SwTxtFormatInfo

Main Objectives - A closer Look

Text Formatting

Line Breaks and the Break Iterator

Line Break Handling

Repaint Events

Cursor Positioning

Cursor Positions inside Fields

Details on selected Topics

Fly Formatting

Kerning Portions

Hanging Punctuation

Text Output

The nDelta Member of SwParaPortion

Font Caches

SwFontCache

SwFntCache

Drop Caps

Spellchecking

Online (Auto) Spell Checking

Interactive Spell Checking

Vertical Formatting

Asian Grid Mode

Alphabetic Index Sorting

Files in the Star Writer Project

inc/atrhndl.hxx

inc/drawfont.hxx

text/inftxt.hxx

text/itratr.hxx

text/itrform2.hxx

text/itrpaint.hxx

text/itrtxt.hxx

text/pordrop.hxx

text/porexp.hxx

text/porfld.hxx

text/porfly.hxx

text/porfnt.hxx

text/porhyph.hxx

text/porlay.hxx

text/porlin.hxx

text/pormulti.hxx

text/porref.hxx