Difference between revisions of "Pdf Import Extension/Current Architecture"
Line 5: | Line 5: | ||
Currently, the way PDF files get imported looks like this: | Currently, the way PDF files get imported looks like this: | ||
− | [[Image:Pdf_architecture.png|center| | + | [[Image:Pdf_architecture.png|center]] |
+ | |||
+ | That is, once triggered from the framework filter configuration, the importer component passes on the filename of the pdf file to the xpdf executable, which loads and parses it, generating a bunch of pretty low-level drawing commands (like "put a glyph at position (x,y)") on stdout. This, in turn, is then read back from the office process, put into a tree structure page-wise, which is afterwards worked upon to combine glyhs, polygons etc. into pieces a bit more sensible to the user (draw shapes, and actual paragraphs of text). | ||
+ | |||
+ | ==Tree classes== | ||
+ | |||
+ | This is the inheritance graph of the classes representing the graphical document tree: | ||
+ | |||
+ | [[Image:Pdfimport-tree-nodes.png|center]] | ||
+ | |||
+ | ==Output generation classes== | ||
+ | |||
+ | This is the interface and the two existing classes generating actual document output: | ||
+ | |||
+ | [[Image:Pdfimport-tree-nodes.png|center]] | ||
+ | |||
+ | ==Low-level event input== | ||
+ | |||
+ | This is the interface and the existing implementation receiving the low-level output commands from the pdf file (the "draw glyph at (x,y)" type of input): | ||
+ | |||
+ | [[Image:Pdfimport-tree-nodes.png|center]] | ||
+ | |||
+ | There's one more class of this type in the unit test directory [http://framework.openoffice.org/source/browse/framework/filter/source/pdfimport/test/?only_with_tag=cws_src680_picom filter/source/pdfimport/test] |
Revision as of 16:14, 12 November 2007
Currently, the PDF import extension utilizes xpdf for parsing the pdf file, and generating a bunch of low-level output operations to synthesize an ODF document.
This is a bit cumbersome, as xpdf is GPL licensed, which makes it necessary to run it completely out-of-process for OOo (being LGPL-licensed). A dedicated replacement parser is in the making (filter/source/pdfimport/pdfparse), will take some time to be on par with xpdf, though.
Currently, the way PDF files get imported looks like this:
That is, once triggered from the framework filter configuration, the importer component passes on the filename of the pdf file to the xpdf executable, which loads and parses it, generating a bunch of pretty low-level drawing commands (like "put a glyph at position (x,y)") on stdout. This, in turn, is then read back from the office process, put into a tree structure page-wise, which is afterwards worked upon to combine glyhs, polygons etc. into pieces a bit more sensible to the user (draw shapes, and actual paragraphs of text).
Tree classes
This is the inheritance graph of the classes representing the graphical document tree:
Output generation classes
This is the interface and the two existing classes generating actual document output:
Low-level event input
This is the interface and the existing implementation receiving the low-level output commands from the pdf file (the "draw glyph at (x,y)" type of input):
There's one more class of this type in the unit test directory filter/source/pdfimport/test