OpenOffice.org Internship/Tasks/Proper paragraphs
The task is to implement correct importing text paragraphs. In current version of extension we can import only single lines what is quite inconvenient when we try to edit text.
Line importing in current extension
There is no information that would come from XPDF to inform that tag BT was met, so we cannot determine if a new text object occurs. Line is recognized by the position of consecutive glyphs (rectangles containing glyphs indeed). If two consecutive rectangles are close enough to each other, they are threaten as belonging to the same line. This solution is not perfect because we have to determine what means "close enough".
Idea of paragraph importing
To import whole paragraphs I suggest similar solution to the one described above, but instead of glyphs and lines we will consider lines and paragraphs. It implies following: when lines are close enough they are threaten as one paragraph. Several cases may occur, but most of them are quite easy.
Moreover glyph processing is quite complex. It would be better to use encapsulation in order to delegate functionality of glyph processing to standalone class. It would reduce the mess in pdifprocessor that contains methods responsible for every kind of processing. The main goal is to make pdiprocessor a wrapper containing smaller classes with separate responsibilities - there is a lot of advantages of this approach.
Another solution would be to modify Gfx and OutDev from XPDF. As it was said in the beginning of this page, there is no information when BT is met. So the solution would be to inform OutDev about it, by changing the code. Unfortunately I see some problems associated with this solution: BT contains much more than a single paragraph sometimes, and another is position glyphs within draw text objects. Moreover it requires changes in makefile (the extension code).