Difference between revisions of "Architecture/Source Code Inventory"

Revision as of 13:38, 17 November 2006

Owner: Kay Ramme, Stefan Zimmermann Type: analysis State: draft

Introduction

Recent surveys and current experiences with the project have caused concern over the existent "barrier of entrance" for potential contributors that may hinder e.g. developers to become an active member in the community. This "barrier of entrance" surely has a lot of dimensions. Some of these dimensions may be the complexity of the source code, the build environment, the lack of modularity or simply the pure mass of items involved in the product.

http://marketing.openoffice.org/ooocon2006/presentations/wednesday_c10.odp

Therefor Kay Ramme and Stefan Zimmermann stepped up to determine the sub-dimensions of complexity, find and develop measures to quantify the code base of the project OpenOffice.org, and provide data that describes sub-dimensions of complexity in the project to potential improvement teams. This is a call for help. Everybody who want to contribute his experiences and ideas is more than welcome.

Motto

The overarching motto we agree is : Less [code] is better !, where the word "code" is actually optional.

If we say "less", we need in turn to know how much we have now. Means we need to quantify our (code) base. Although we think we should focus in the first step on specific areas which are:

dead code
redundancy
cyclomatic complexity (McCabe)
(unused/useless features)

after these focus areas are adressed, we may focus more on finding indicators for some properties that are described in the next Section, "The Zen of Programming" ;)

The Zen of Programming

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

(cited from the Zen of Python by Tim Peters)

possible data collection plan

Data to be collected:

At first it is quantitative data and will range from number of files, lines of code (in it's characteristics LINES and SLOC according to DSI concept), number of classes, methods, lines of code per function etc. but also file dependencies, -scattering, -location will get into focus of investigation.

Purpose of Data Collection:

Ultimately, the goal is to provided ideas how to simplify the project to lower the "barrier of entrance" for contributors and determine if maintenance capability or maintainability can be expressed

What Insight The Data Will Provide:

The data, when counted and compared will provide us with information about dependencies, redundencies in the code as well as the purpose/duty of specific code sections.

How It Will Help potential Improvement Teams:

The teams will be able to make a decision on whether to eliminate, consolidate, refactor or modularize code or simply abandom from consideration the possible effects of the multiple dimensions of complexity.

What Will Be Done With The Data After Collection:

The teams will use the data to arrive at code complexity measures, which may be able to describe code "easy to maintain" and code "not so easy to maintain " :). For sure the data will be used to continuously draw a picture what OpenOffice code base is about and how it develops over time.

What we think what data to collect and why (Detailed)

Source Code Size Metrics

Code Metrics

LINES
- size estimates
- best practice comparison
- language to language comparison
LOC
- size estimates
- comment line / source line / blank line ratio
- best practice comparison
- language to language comparison
SLOC (source lines of code)
- compiler relevant lines of code
- relate to DSI
DSI (delivered source instructions)
- use in COCOMO II (Constructive Cost Model II)
- PM (person month) estimates
- TDEV (development time) estimates
Pre Processor directives
- creating file inclusion hierarchy
- comparing definition count (constants and macros) with "best practice"
Keywords
- calculate cyclomatic complexity (MyCabe)
- compare with "best practice"
Statements
- compare with DSI
- estimate statement density per method, file
Classes
- class hierarchy (inheritance depth)
- dependencies (circular)
- "is a" - "has a" relationships (ratio)
Methods per Class
- size of class
- maintainability estimations
- interfaces (external)

File Metrics

Files to handle
- evaluation of possible consolidation efforts (scattering together with location)
- comparison with "best practice" data of industrie
- ratio of product source to product build environment
File inclusion hierarchy
- inclusion depth
- file dependencies
- consolidation opportunities
Commented Code Sections
- potential sub instance of dead/unneeded code
- consolidation opportunities

How we may want to count

Call for Help

Any ideas and experiences about what to collect how and why are welcome

Links

Wikipedia gives a good introduction to software metrics, pros / cons and approaches.
Thorsten was so kind creating a page with various (source) code tools, including tools generating metrics etc. See Other Tools.
Official OOo statistics - http://stats.openoffice.org
Wikipedia about wikipedia:Software_maintenance.

To be continued ...

@@ Line 67: / Line 67: @@
 The teams will use the data to arrive at code complexity measures, which may be able to describe code "easy to maintain" and code "not so easy to maintain " :). For sure the data will be used to continuously draw a picture what OpenOffice code base is about and how it develops over time.
-== What we think what data to collect and why [[Analysis/Source_Code_Inventory/Count_Details | (Detailed) ]] ==
+== What we think what data to collect and why  (Detailed)  ==
-* '''Counts'''
+===[[Analysis/Source_Code_Inventory/Count_Details | Source Code Size Metrics]]===
-** Files to handle
+====Code Metrics====
-*** evaluation of possible consolidation efforts (scattering together with location)
+* LINES
-*** comparison with "best practice" data of industrie
+** size estimates
-*** ratio of product source to product build environment
+** best practice comparison
-** LINES
+** language to language comparison
-*** size estimates
+* LOC
-*** comment line / source line ratio
+** size estimates
-*** best practice comparison
+** comment line / source line / blank line ratio
-*** language to language comparison
+** best practice comparison
-** SLOC (source lines of code) according to DSI concept (delivered source instructions)
+** language to language comparison
-*** use in COCOMO II (Constructive Cost Model II)
+* SLOC (source lines of code)
-*** PM (person month) estimates
+** compiler relevant lines of code
-*** TDEV (development time) estimates
+** relate to DSI
-** Pre Processor directives
+* DSI (delivered source instructions)
-*** creating file inclusion hierarchy
+** use in COCOMO II (Constructive Cost Model II)
-*** comparing definition count (constants and macros) with "best practice"
+** PM (person month) estimates
-** Keywords
+** TDEV (development time) estimates
-*** calculate cyclomatic complexity (MyCabe)
+* Pre Processor directives
-*** compare with "best practice"
+** creating file inclusion hierarchy
-** Statements
+** comparing definition count (constants and macros) with "best practice"
-*** compare with DSI
+* Keywords
-*** estimate statement density per method, file
+** calculate cyclomatic complexity (MyCabe)
-** Classes
+** compare with "best practice"
-*** class hierarchy (inheritance depth)
+* Statements
-*** dependencies (circular)
+** compare with DSI
-*** "is a" - "has a" relationships (ratio)
+** estimate statement density per method, file
-** Methods per Class
+* Classes
-*** size of class
+** class hierarchy (inheritance depth)
-*** maintainability estimations
+** dependencies (circular)
-*** interfaces (external)
+** "is a" - "has a" relationships (ratio)
-* '''Indicators'''
+* Methods per Class
-** File inclusion hierarchy
+** size of class
-*** inclusion depth
+** maintainability estimations
-*** file dependencies
+** interfaces (external)
-*** consolidation opportunities
+====File Metrics====
-** Commented Code Sections
+* Files to handle
-*** potential sub instance of dead/unneeded code
+** evaluation of possible consolidation efforts (scattering together with location)
-*** consolidation opportunities
+** comparison with "best practice" data of industrie
+** ratio of product source to product build environment
+* File inclusion hierarchy
+** inclusion depth
+** file dependencies
+** consolidation opportunities
+* Commented Code Sections
+** potential sub instance of dead/unneeded code
+** consolidation opportunities
+===[[Analysis/Source_Code_Inventory/Count_Details | How we may want to count ]]===
 ==Call for Help==

Difference between revisions of "Architecture/Source Code Inventory"

Revision as of 13:38, 17 November 2006

Contents

Introduction

Motto

The Zen of Programming

possible data collection plan

What we think what data to collect and why (Detailed)

Source Code Size Metrics

Code Metrics

File Metrics

How we may want to count

Call for Help

Links

Views

Personal tools

Navigation

Print/export

Search

Tools