Architecture/To-Dos
This page intends to collect various architectural deficiencies (aka the pet peeves of various people) of OpenOffice.org, and lists the areas where's work in progress to improve on the architecture.
Depending on the specific count algorithm, OOo consists of approximately 7E6 lines of code (the overwhelming lot being c++, all other being an order of magnitude less (Java, Perl, Basic, Python)). This sheer size in and of itself is a problem - the code base is notorious for crashing or slowing down to a crawl various software engineering tools, from debugger to dependency analysis to reverse design extraction.
The code itself varies greatly in quality, style, and age (the latter invariably leading to the former, if you recall the history and evolvement of c++), with parts being there virtually unmodified for 10+ years, and others just recently written from scratch.
Taken together, this leads to a lot of complexity and redundancy, which is very hard to remove.
Facing this amount of code, the big rules must be:
- simplify
- remove internal redundancy
- remove external redundancy (use external projects, whereever possible)
- remove unused or dead code
- remove legacy functionality, which does no longer provide noticeable value (e.g. binfilter)
- refactor for orthogonality
- make subsystems implement independent functionality
- enable combinations of those subsystems to be freely combinable
- carry that to the UI level (no artificial restrictions on what one can do with UI objects - e.g. shapes can be rotated, and clearly text frames should, too)
Infrastructure Improvements
- Speeding up the build system, and maybe even make it consider global dependencies (currently, OOo has the notion of modules, which approximately map to toplevel directories in the build tree. Automatic build-time dependency calculation is currently only available on the intra-module level).
- Making the actual design more accessible, improving upon existing solutions like LXR or Bonsai. Ultimately, this should result in refactorings of the source code being both much easier and much safer than today, by providing information where and how specific functionality is used. A prerequisite for that would be a parser that really knows about c++ - gccxml might be a starting point.
Runtime System Improvements
This is about making the implementation languages safer, and easier to use. What follows could also be subsumed under "transparency on the implementation level". When something can be used transparently, or appears transparent to a user, it is an implementation aspect she need not care about. Being able to program in an environment which is transparent with regard to lots of aspects, empowers the developer to focus on the problem at hand, not having to litter her code with mundane tasks such as memory management or locking.
- Make threading transparent. Currently, fulfilling the contract of a UNO component regarding thread-safeness is
- tedious work, because normally each involved object has to acquire and release a mutex on method entry and exit, respectively
- almost impossible to get right, let alone verified to work correctly (no races, no deadlocks), because of the sheer mass of involved objects and mutices (the number of distinct states that would have to be checked for a proper verification is intractable for anything but the most trivial examples). The upcoming extended Binary Uno threading-model makes thread-safeness transparent, by automatically locking and unlocking when entering or exiting components on a much coarser level than single methods.
- Make other mundane stuff transparent. Like memory management (via garbage collection, or refcounting via smart ptrs, UNO reference), or transactionality (the mode of making changes take place either completely, or not at all. Having a component behave in a non-transactional way in the face of an error makes recovery rather hard. There's more to transactionality than exception-safeness. Imagine two users collaborating on the same document).
General Refactoring Improvements
For many reasons the OpenOffice.org codebase is difficult to understand and navigate. On of the reasons is a lack of cleanup in the code. There is a never ending list of things that ought to be done-- add some of your own.
- Actually remove deprecated things. Things like String and UniString need to go. svtools and tools have loads of stuff that is duplicated elsewhere or is deprecated. Getting rid of these sorts of things will make maintaining application code much easier.
- Document things. Some of the code has comments that at one time were correct. Some code has German comments. While most of the OpenOffice.org programmers sprechen zie Deutsch, there is an unofficial understanding that German comments mean "don't touch."
Code Improvements
Remove unused code
Binary Loading/Saving stuff in ItemSets, depend on EditEngine Loading/Saving (only used for Clipboard) - MT
Remove duplicate code
Consolidate slightly copied and modified code.
- BigPointerArray vs. SvPointerArray
- RTL Strings with Tools Strings
Consolidate Text Engines
- Text Engine
- Writer Engine
- Edit Engine
Replace code with 3rd party
Replace self made containers with STL containers.
- Tools Container
- SvPointerArray, BigPointerArray
- "GetPos" is mostly used wrong -> remove it (algorithmic complexity to hight O(n*n))
Improve modularity
?Clear "Mission Statements" for modules?
VCL
get rid of internal event queue.
Application-specific Improvements
One of the lingering problems on the application level is the fact that, in spite of modularized lower-level functionality, application functionality cannot be shared between OOo's applications (except via embedding of a whole application (OLE)). This is because for neither Calc nor Writer, there are reusable application engines, like a text engine providing text editing and layouting functionality, or a table engine providing formula and calculation support. Draw/Impress already uses a shared engine, dubbed 'Drawing Layer'. But there's still considerable functionality hidden in the application code, which is worth extracting. Especially the missing Writer engine manifests itself in duplicated text editing functionality in EditEngine and TextEngine (used by Impress and Calc for their corresponding text functionality).
Another area of improvement is rendering. Currently, all application's graphical output is based on the OutputDevice class, which provides only very basic rendering facilities (in fact, besides largely extended text output functionality (to handle OOo's i18n requirements), this interface has basically remained unchanged for a long time). Specifically, things like performant alpha compositing or anti-aliased geometry rendering are extremely hard to achieve with the current design. Therefore, starting with OOo 2.0, the XCanvas interface is slated to gradually replace OutputDevice in all applications.
Writer
- break up the monolith
- make the import filters more modular
- port rendering to XCanvas
Calc
- break up the monolith
- speed improvements
- port rendering to XCanvas
Draw/Impress
- break up the monolith
- become more decoupled from sfx2
- redesign API ( performance)
- port Drawing Layer to XCanvas (see DrawingPrimitives for one of the preconditions)