Difference between revisions of "Modularization"

Revision as of 15:59, 1 February 2006

OpenOffice.org Modularization

Often the question arises why is OpenOffice.org not more modular. There are several aspects of modularization:

Views on Modularization

User View

From the user point of view there are several modules:

Word Processor
Spreadsheet Module
Impress Module
Database Module
Core Modules
Desktop Integration
Filter Modules

and many more.

Architectural View

The Architectural OverView shows a very stripped down overview. In reality we will find up to 20 layers below an Application module. All code which can be shared among the Application modules has been moved to core modules. Examples for such layers are

System Abstraction Layer
Infrastructure Layer
Framework Layer

These layers themselves can contain some layers again, helper and wrapper layer and so on.

Source Code

The Source Code itself is group im more than 150 modules, each of these can be build in one pass.

Each compilation unit and each C++ class also can be viewed as a module.

Libraries

Often libraries are build on per (source code/CVS) module basis, sometimes several libraries are built in one CVS module. Attention: at many places the runtime dependencies of the library modules differs from the build sequence of CVS modules.

Goals of Modularization

When viewing the list of problems, one can see some terms repeated often. In reverse, those lead to some goals (items are not necessarily distinct):

Reducing dependencies
- implementation dependencies (improve maintainability, testability, correctness)
- build time dependencies (reduce build effort, also accelerate development)
- run time dependencies (improve runtime efficiency)
Maintainability
- clear points of responsibility - one piece of code for each task
- changes have less risk, because they most times only affect a clearly separated amount of code
- code is easier to understand, if module's tasks and interfaces are clearly defined
Testability
- modules can be tested in isolation
Runtime efficiency
- only needed modules are loaded, leading to less memory usage and faster startup
Correctness
- less regression, because changes have fewer side effects
- better tests, because tests can concentrate on sharply separated units

Principles of Modularization

How does one recognize a module? What is a "good" module?

encapsulation
unambiguously defined, consistent and minimal interfaces
dependencies always shall have tree-shape, no circles (small dependency circles up to about 5 compilation units or 2 larger modules may be necessary sometimes)

Review of the Current Structure

Possibilities

The current structure does not only have problems, but also provides some possibilities on which further modularization might build up.

The system abstraction layer provides a lot of independence from platforms and enables, though OpenOffice.org is a huge project, relatively easy portability.
UNO allows to write components which enforce modularization, because they are only accessible via well defined interfaces.

General Problems

List problems here, if they occur at many places or are of principal nature.

Selective Installation of Application modules doesn't save disk space for the User.
Amount of modules and their dependencies is impossible to overview for Developers, this leads to
- maintainability problems
- code complexity
- long build times
Architectural/source code modules are not clearly separated from each other by interfaces. Instead clients depend on implementation details of the module. Often it is not really defined what shall be interface and what was meant as implementation only. Consequences:
- complexity - as it is unclear which functions of a module are for public use and which are not
- avoidable build time - as dependant source code needs to be rebuild, when implementation details of a module are changed
- maintainability problems - because when changing implementation details of a module, it is unclear which client code will be influenced and how.
There is a lot of duplicate code for the same tasks in parallel use cases. This leads to problems with
- performance - because of increased code size in memory
- maintainability - because to change one feature, it is often necessary to change a lot of different locations in code; sometimes even nobody knows which all those locations would be
Large circular dependencies among compilation units. There are several cases where one or a few hundred compilation units are circular dependent on each other. This means:
- testing problems - none of them can be tested in isolation, unit tests are nearly impossible
- maintainability problems - lots of regression, because a change in one of those files may cause side effects in a few hundred others

Specific Suggestions for Improvement

List suggestions here, if they are already related to concrete locations in the code or specific architectural concepts.

@@ Line 55: / Line 55: @@
 == Principles of Modularization ==
+How does one recognize a module? What is a "good" module?
 * encapsulation
 * unambiguously defined, consistent and minimal interfaces