Architecture/Source Code Inventory

Owner: Kay Ramme, Stefan Zimmermann Type: analysis State: draft

Introduction

Recent surveys and current experiences with the project have caused concern over the existent "barrier of entrance" for potential contributors that may hinder e.g. developers to become an active member in the community. This "barrier of entrance" surely has a lot of dimensions. Some of these dimensions may be the complexity of the source code, the build environment, the lack of modularity or simply the pure mass of items involved in the product.

http://marketing.openoffice.org/ooocon2006/presentations/wednesday_c10.odp

Therefor Kay Ramme and Stefan Zimmermann stepped up to determine the sub-dimensions of complexity, find and develop measures to quantify the code base of the project OpenOffice.org, and provide data that describes sub-dimensions of complexity in the project to potential improvement teams. This is a call for help. Everybody who want to contribute his experiences and ideas is more than welcome.

Motto

The overarching motto we agree is : Less [code] is better ! Where the word "code" is actually optional.

If we say "less", we need in turn to know how much we have now. Means we need to quantify our (code) base. Althoug we think we should focus in the first step of specific areas which are:

dead code
redundancy
cyclomatic complexity (McCabe)
(useless features)

possible data collection plan

Data to be collected:

At first it is quantitative data and will range from number of files, lines of code (in it's characteristics LINES and SLOC according to DSI concept), number of classes, methods, lines of code per function etc. but also file dependencies, -scattering, -location will get into focus of investigation.

Purpose of Data Collection: Ultimately, the goal is to provided ideas how to simplify the project to lower the "barrier of entrance" for contributors and determine if maintenance capability or maintainability can be expressed

What Insight The Data Will Provide: The data, when counted and compared will provide us with information about dependencies, redundencies in the code as well as the purpose/duty of specific code sections.

How It Will Help potential Improvement Teams: The teams will be able to make a decision on whether to eliminate, consolidate, refactor or modularize code or simply abandom from consideration the possible effects of the multiple dimensions of complexity.

What Will Be Done With The Data After Collection: The teams will use the data to arrive at code complexity measures, which may be able to describe code "easy to maintain" and code "not so easy to maintain " :). For sure the data will be used to continuously draw a picture what OpenOffice code base is about and how it develops over time.

What we think what data to collect and why (Detailed)

Counts
- Files to handle
  - evaluation of possible consolidation efforts (scattering together with location)
  - comparison with "best practice" data of industrie
  - ratio of product source to product build environment
- LINES
  - size estimates
  - comment line / source line ratio
  - best practice comparison
  - language to language comparison
- SLOC (source lines of code) according to DSI concept (delivered source instructions)
  - use in COCOMO II (Constructive Cost Model II)
  - PM (person month) estimates
  - TDEV (development time) estimates
- Pre Processor directives
  - creating file inclusion hierarchy
  - comparing definition count (constants and macros) with "best practice"
- Keywords
  - calculate cyclomatic complecity (MyCabe)
  - compare with "best practice"
- Statements
  - compare with DSI
  - estimate statement density per method, file
- Classes
  - class hierarchy (inheritance depth)
  - dependencies (circular)
  - "is a" - "has a" relationships (ratio)

Any ideas and experiences about what to collect why are welcome

Links

Wikipedia gives a good introduction to software metrics, pros / cons and approaches.
Thorsten was so kind creating a page with various (source) code tools, including tools generating metrics etc. See Other Tools.
Official OOo statistics - http://stats.openoffice.org
Wikipedia about wikipedia:Software_maintenance.

To be continued ...

Architecture/Source Code Inventory

Contents

Introduction

Motto

possible data collection plan

What we think what data to collect and why (Detailed)

Links

Views

Personal tools

Navigation

Search

Tools