Difference between revisions of "Architecture/Source Code Inventory"

Revision as of 15:16, 8 December 2006

Owner: Kay Ramme, Stefan Zimmermann Type: analysis State: draft

Introduction

Recent surveys and current experiences with the project have caused concern over the existent "barrier of entrance" for potential contributors that may hinder e.g. developers to become an active member in the community. This "barrier of entrance" surely has a lot of dimensions. Some of these dimensions may be the complexity of the source code, the build environment, the lack of modularity or simply the pure mass of items involved in the product.

http://marketing.openoffice.org/ooocon2006/presentations/wednesday_c10.odp

Therefor Kay Ramme and Stefan Zimmermann stepped up to determine the sub-dimensions of complexity, find and develop measures to quantify the code base of the project OpenOffice.org, and provide data that describes sub-dimensions of complexity in the project to potential improvement teams. This is a call for help. Everybody who want to contribute his experiences and ideas is more than welcome.

Motto

The overarching motto we agree is : Less [code] is better !, where the word "code" is actually optional.

If we say "less", we need in turn to know how much we have now. Means we need to quantify our (code) base. Although we think we should focus in the first step on specific areas which are:

dead code
redundancy
cyclomatic complexity (McCabe)
(unused/useless features)

after these focus areas are adressed, we may focus more on finding indicators for some properties that are described in the next Section, "The Zen of Programming" ;)

The Zen of Programming

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

(cited from the Zen of Python by Tim Peters)

possible data collection plan

Data to be collected:

At first it is quantitative data and will range from number of files, lines of code (in it's characteristics LINES and SLOC according to DSI concept), number of classes, methods, lines of code per function etc. but also file dependencies, -scattering, -location will get into focus of investigation.

Purpose of Data Collection:

Ultimately, the goal is to provided ideas how to simplify the project to lower the "barrier of entrance" for contributors and determine if maintenance capability or maintainability can be expressed

What Insight The Data Will Provide:

The data, when counted and compared will provide us with information about dependencies, redundencies in the code as well as the purpose/duty of specific code sections.

How It Will Help potential Improvement Teams:

The teams will be able to make a decision on whether to eliminate, consolidate, refactor or modularize code or simply abandom from consideration the possible effects of the multiple dimensions of complexity.

What Will Be Done With The Data After Collection:

The teams will use the data to arrive at code complexity measures, which may be able to describe code "easy to maintain" and code "not so easy to maintain " :). For sure the data will be used to continuously draw a picture what OpenOffice code base is about and how it develops over time.

What we think what data to collect and why (Detailed)

Source Code Size Metrics

On the way to develop an "Operational Definition" the sub-site "Size Metrics" details how we measure the potential data points mentioned here. What is i.e. a "Source Line of Code" (SLOC).

Code Metrics

LINES
- size estimates
- best practice comparison
- language to language comparison
LOC
- size estimates
- comment line / source line / blank line ratio
- best practice comparison
- language to language comparison
SLOC (source lines of code)
- compiler relevant lines of code
- relate to DSI
DSI (delivered source instructions)
- use in COCOMO II (Constructive Cost Model II)
- PM (person month) estimates
- TDEV (development time) estimates
Pre Processor directives
- creating file inclusion hierarchy
- comparing definition count (constants and macros) with "best practice"
Keywords
- calculate cyclomatic complexity (MyCabe)
- compare with "best practice"
Statements
- compare with DSI
- estimate statement density per method, file
Classes
- class hierarchy (inheritance depth)
- dependencies (circular)
- "is a" - "has a" relationships (ratio)
Methods per Class
- size of class
- maintainability estimations
- interfaces (external)

File Metrics

Files to handle
- evaluation of possible consolidation efforts (scattering together with location)
- comparison with "best practice" data of industrie
- ratio of product source to product build environment
File inclusion hierarchy
- inclusion depth
- file dependencies
- consolidation opportunities
Commented Code Sections
- potential sub instance of dead/unneeded code
- consolidation opportunities

Call for Help

Any ideas and experiences about what to collect how and why are welcome

Links

Wikipedia gives a good introduction to software metrics, pros / cons and approaches.
Thorsten was so kind creating a page with various (source) code tools, including tools generating metrics etc. See Other Tools.
Official OOo statistics - http://stats.openoffice.org
Wikipedia about wikipedia:Software_maintenance.

To be continued ...

@@ Line 68: / Line 68: @@
 == What we think what data to collect and why  (Detailed)  ==
-===Source Code [[Analysis/Source_Code_Inventory/Count_Details | Size Metrics]]===
+===Source Code [[Architecture/Source_Code Inventory/Count_Details | Size Metrics]]===
-On the way to develop an "Operational Definition" the sub-site [[Analysis/Source_Code_Inventory/Count_Details | "Size Metrics"]] details how we measure the potential data points mentioned here. What is i.e. a "Source Line of Code" (SLOC).
+On the way to develop an "Operational Definition" the sub-site [[Architecture/Source_Code_Inventory/Count_Details | "Size Metrics"]] details how we measure the potential data points mentioned here. What is i.e. a "Source Line of Code" (SLOC).
 ====Code Metrics====
@@ Line 117: / Line 117: @@
 ** potential sub instance of dead/unneeded code
 ** consolidation opportunities
 ==Call for Help==

Difference between revisions of "Architecture/Source Code Inventory"

Revision as of 15:16, 8 December 2006

Contents

Introduction

Motto

The Zen of Programming

possible data collection plan

What we think what data to collect and why (Detailed)

Source Code Size Metrics

Code Metrics

File Metrics

Call for Help

Links

Views

Personal tools

Navigation

Search

Tools