Difference between revisions of "OpenOffice.org Internship/Tasks/Proper paragraphs"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Description of implemented solution)
(Description of implemented solution)
Line 27: Line 27:
 
The only constructor of the class receives facade to PDFIProcessor class - this solution allows to call required methods from PDFIProcessor, but hide rest of methods, what prevents against modifying it's content externally. Objects of the class posses a pointer to currently computed paragraph. The main function of the class is "process" method, that receives arguments rFontMatrix, aRect and char to draw and starts paragraph structure creating. Every new glyphs is tried to add to current paragraph, if it fails a new line withing current paragraph is tried to be created. If it fails as well, the paragraph is drop and a new one is created to replace the current one. Every time new glyph or line is add the paragraph properties need to be updated to correctly count if next glyphs/lines might be contained in it. The second public method is "drop" to drop it's content overtly, when pdf "end of text object" command is met while parsing.  
 
The only constructor of the class receives facade to PDFIProcessor class - this solution allows to call required methods from PDFIProcessor, but hide rest of methods, what prevents against modifying it's content externally. Objects of the class posses a pointer to currently computed paragraph. The main function of the class is "process" method, that receives arguments rFontMatrix, aRect and char to draw and starts paragraph structure creating. Every new glyphs is tried to add to current paragraph, if it fails a new line withing current paragraph is tried to be created. If it fails as well, the paragraph is drop and a new one is created to replace the current one. Every time new glyph or line is add the paragraph properties need to be updated to correctly count if next glyphs/lines might be contained in it. The second public method is "drop" to drop it's content overtly, when pdf "end of text object" command is met while parsing.  
  
# CharGlyphParagraph
+
=== CharGlyphParagraph ===
  
Object contains a list of lines that are
+
Object contains a list of lines that might be found in the paragraph. It provides public method "add" to add new glyphs to it, that returns true when new glyph was successfully add, or false otherwise. There is a simple mathematical equation to determine if such a addition is possible. Another public function is "drop" method that calls "drop" function of every line in the paragraph.
 +
 
 +
=== CharGlyphLine ===
 +
 
 +
The class represents single line within a paragraph. Likewise CharGlyphParagraph, CharGlyphLine provides public methods "add" and "drop" with similar actions. Moreover object of the class posses a list of glyphs in current line.
 +
 
 +
=== CharGlyph ===
 +
 
 +
Class is used to represent glyph object with all its properties and functions allowing operations on it.
 +
 
 +
=== Droping ===

Revision as of 17:50, 18 September 2010

The task is to implement correct importing text paragraphs. In current version of extension we can import only single lines what is quite inconvenient when we try to edit text.

Line importing in current extension

There is no information that would come from XPDF to inform that tag BT was met, so we cannot determine if a new text object occurs. Line is recognized by the position of consecutive glyphs (rectangles containing glyphs indeed). If two consecutive rectangles are close enough to each other, they are threaten as belonging to the same line. This solution is not perfect because we have to determine what means "close enough".

Idea of paragraph importing

To import whole paragraphs I suggest similar solution to the one described above, but instead of glyphs and lines we will consider lines and paragraphs. It implies following: when lines are close enough they are threaten as one paragraph. Several cases may occur, but most of them are quite easy.

Moreover glyph processing is quite complex. It would be better to use encapsulation in order to delegate functionality of glyph processing to standalone class. It would reduce the mess in pdifprocessor that contains methods responsible for every kind of processing. The main goal is to make pdiprocessor a wrapper containing smaller classes with separate responsibilities - there is a lot of advantages of this approach.

Another solution

Another solution would be to modify Gfx and OutDev from XPDF. As it was said in the beginning of this page, there is no information when BT is met. So the solution would be to inform OutDev about it, by changing the code. Unfortunately I see some problems associated with this solution: BT contains much more than a single paragraph sometimes, and another is position glyphs within draw text objects. Moreover it requires changes in makefile (the extension code).

Description of implemented solution

  1. Changes in PDFIProcessor

All responsibility for glyph processing has been moved to CharGlyphsProcessor class initialized by passing PDFProcessor object to the only constructor. It is not the whole object indeed, but only several required functionalities implemented with facade design pattern - it's not suggested to modify PDFIProcessor content within CharGlyphsProcessor. PDFIProcessor posses CharGlyphsProcessor object and instead of running processGlyphLine in drawGlyphs function, CharGlyphsProcessor::process is executed, what starts processing of current glyph.

  1. CharGlyphProcessor

The only constructor of the class receives facade to PDFIProcessor class - this solution allows to call required methods from PDFIProcessor, but hide rest of methods, what prevents against modifying it's content externally. Objects of the class posses a pointer to currently computed paragraph. The main function of the class is "process" method, that receives arguments rFontMatrix, aRect and char to draw and starts paragraph structure creating. Every new glyphs is tried to add to current paragraph, if it fails a new line withing current paragraph is tried to be created. If it fails as well, the paragraph is drop and a new one is created to replace the current one. Every time new glyph or line is add the paragraph properties need to be updated to correctly count if next glyphs/lines might be contained in it. The second public method is "drop" to drop it's content overtly, when pdf "end of text object" command is met while parsing.

CharGlyphParagraph

Object contains a list of lines that might be found in the paragraph. It provides public method "add" to add new glyphs to it, that returns true when new glyph was successfully add, or false otherwise. There is a simple mathematical equation to determine if such a addition is possible. Another public function is "drop" method that calls "drop" function of every line in the paragraph.

CharGlyphLine

The class represents single line within a paragraph. Likewise CharGlyphParagraph, CharGlyphLine provides public methods "add" and "drop" with similar actions. Moreover object of the class posses a list of glyphs in current line.

CharGlyph

Class is used to represent glyph object with all its properties and functions allowing operations on it.

Droping

Personal tools