Pdf Import Extension/Current Architecture
Currently, the PDF import extension utilizes xpdf for parsing the pdf file, and generating a bunch of low-level output operations to synthesize an ODF document.
This is a bit cumbersome, as xpdf is GPL licensed, which makes it necessary to run it completely out-of-process for OOo (being LGPL-licensed). A dedicated replacement parser is in the making (filter/source/pdfimport/pdfparse), will take some time to be on par with xpdf, though.
Currently, the way PDF files get imported looks like this:
That is, once triggered from the framework filter configuration, the importer component passes on the filename of the pdf file to the xpdf executable, which loads and parses it, generating a bunch of pretty low-level drawing commands (like "put a glyph at position (x,y)") on stdout. This, in turn, is then read back from the office process, put into a tree structure page-wise, which is afterwards worked upon to combine glyhs, polygons etc. into pieces a bit more sensible to the user (draw shapes, and actual paragraphs of text).
Tree classes
This is the inheritance graph of the classes representing the graphical document tree:
Output generation classes
This is the interface and the two existing classes generating actual document output:
Specifically, the ContentSink interface is defined like this:
struct ContentSink { virtual ~ContentSink() {} /// Total number of pages for upcoming document virtual void setPageNum( sal_Int32 nNumPages ) = 0; virtual void startPage( const ::com::sun::star::geometry::RealSize2D& rSize ) = 0; virtual void endPage() = 0; virtual void hyperLink( const ::com::sun::star::geometry::RealRectangle2D& rBounds, const ::rtl::OUString& rURI ) = 0; virtual void pushState() = 0; virtual void popState() = 0; virtual void setFlatness( double ) = 0; virtual void setTransformation( const ::com::sun::star::geometry::AffineMatrix2D& rMatrix ) = 0; virtual void setLineDash( const ::com::sun::star::uno::Sequence<double>& dashes, double start ) = 0; virtual void setLineJoin( sal_Int8 lineJoin ) = 0; virtual void setLineCap( sal_Int8 lineCap ) = 0; virtual void setMiterLimit(double) = 0; virtual void setLineWidth(double) = 0; virtual void setFillColor( const ::com::sun::star::rendering::ARGBColor& rColor ) = 0; virtual void setStrokeColor( const ::com::sun::star::rendering::ARGBColor& rColor ) = 0; virtual void setBlendMode( sal_Int8 blendMode ) = 0; virtual void setFont( const FontAttributes& rFont ) = 0; virtual void strokePath( const ::com::sun::star::uno::Reference< ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0; virtual void fillPath( const ::com::sun::star::uno::Reference< ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0; virtual void eoFillPath( const ::com::sun::star::uno::Reference< ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0; virtual void intersectClip(const ::com::sun::star::uno::Reference< ::com::sun::star::rendering::XPolyPolygon2D >& rPath) = 0; virtual void intersectEoClip(const ::com::sun::star::uno::Reference< ::com::sun::star::rendering::XPolyPolygon2D >& rPath) = 0; virtual void drawGlyphs( const rtl::OUString& rGlyphs, const ::com::sun::star::geometry::RealRectangle2D& rRect, const ::com::sun::star::geometry::Matrix2D& rFontMatrix ) = 0; /// issued when a sequence of associated glyphs is drawn virtual void endText() = 0; /// draws given bitmap as a mask (using current fill color) virtual void drawMask(const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xBitmap, bool bInvert ) = 0; /// Given image must already be color-mapped and normalized to sRGB. virtual void drawImage(const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xBitmap ) = 0; /** Given image must already be color-mapped and normalized to sRGB. maskColors must contain two sequences of color components */ virtual void drawColorMaskedImage(const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xBitmap, const ::com::sun::star::uno::Sequence< ::com::sun::star::uno::Any>& xMaskColors ) = 0; virtual void drawMaskedImage(const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xBitmap, const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xMask, bool bInvertMask) = 0; virtual void drawAlphaMaskedImage(const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xImage, const ::com::sun::star::uno::Sequence< ::com::sun::star::beans::PropertyValue>& xMask) = 0; };
Low-level event input
This is the interface and the existing implementation receiving the low-level output commands from the pdf file (the "draw glyph at (x,y)" type of input):
There's one more class of this type in the unit test directory filter/source/pdfimport/test, implementing a stub device that just checks basic event generation sanity.