26 May 2011

Ending the curse of software maintenance

Metrics collected by Capers Jones in the US show that the percentage of software professionals working on the maintenance of existing applications (legacy) rather than the development of new applications rose from 52% to 73% between 1995 and 2000. Undoubtedly part of the increase is related to Y2K projects, but the numbers have not come down after 2000; instead, Capers Jones estimated that by 2010 79% of software professionals were focused on maintenance.


The diagram above is a visualisation of Capers Jones' metrics expressed as a percentage of the total US population. Firstly, the data indicates that most business processes are now supported by software, and that opportunities to identify manual processes that can be automated are becoming increasingly rare. Secondly, since the number of software professionals required continues to rise, these figures also point to a fundamental problem relating to the notations and techniques that are used to specify software.

Writing new software requires much less effort than understanding and changing existing software. Today most applications are written by software professionals, and not by the people who have a need for the application. This is the result of using programming languages that are on par with legal language in terms of understandability by the average software user. Even expert programmers require large amounts of time to understand software written by others.

In the 1980s the focus of IT system development shifted from automation of previously manual processes to process optimisation (system integration via the use of network technologies) and to the development of more and more complex (more configurable) products and services. As a result, the required amount of source code grew substantially, and the complexity of networked software systems increasingly exceeded human cognitive limits.

The typical method of extending or modifying a software system always was and still is by proceeding in small increments: adding or changing a few handfuls of instructions, and then observing how the modified system behaves. In the 1990s software complexity reached the point where no individual could understand a typical system in all its details, but the experimental approach to software modification still seemed to work beyond this point.

The larger the system, the longer the time that is spent on re-familiarisation with the code versus the time spent on implementing a change. Eventually 80% or more of staff time is consumed by re-familiarisation and by quality assurance activities that attempt to detect and eliminate unintended side effects. Knowledge is encoded in the system, and must be re-encoded in a human brain before modified system behaviour can be encoded.

The amount of source specifications underpinning a modern business software system (such as a Logistics Management system) typically is measured in several millions of Lines of Code (LoC) – 1 million LoC fill 20,000 pages and are comparable to the complexity of 50 books of 400 pages of legal text. The pictures below are the result of tool-based analysis of several millions of lines of production-grade software code. The lines depict the dependencies between the services in a modern service oriented architecture that is no more than 5 years old.


To understand the effort spent by a software team working on the development and maintenance of a non-trivial software system, it is useful to distinguish three categories of activities that are performed by business analysts, architects, database & information designers, software developers, and testers:

  1. Creative exploration of the domain: articulation of new concepts and validation of these concepts in the form of examples and prototype development

  2. Execution of familiar rituals: making use of familiar concepts in the context of familiar structural and behavioural patterns

  3. Re-familiarisation with the encoding: reading the code, related visual models, documentation; discussions with colleagues who developed the code that must be modified; empirical tests to confirm code behaviour before proceeding with modification
Plotted over time, starting with the development of a completely new application, the effort invariably shifts towards re-familiarisation with the encoding, to the extent that more than 80% of the effort falls into this category. Once this stage is reached, changes to the software become very expensive, the number of unintended errors introduced with changes rises, and the total duration from change request to availability of new functionality increases significantly.


The diagram above illustrates the cost of re-familiarisation, which is mainly the result of the using poor notations (traditional text-oriented programming languages) and poor separation of concerns (fragmented semantic identities as a result of using file-based tools). This curse of software maintenance can only be avoided by developing intuitive domain specific notations, and by making use of denotational semantics to avoid a fragmentation of semantic identities. In a nutshell, this is motivation that led to the development of the Gmodel semantic database.

With appropriate notations and tooling, the cost of re-familiarisation with the encoding of the domain can be reduced at least by a factor of 3 to 5, sometimes perhaps even more. As a result, software can be delivered faster, with fewer errors, and at a lower cost, as illustrated below.


Once re-familiarisation is no longer the main cost driver, the organisation can either concentrate on innovation (the green part of the diagram), or on broadening the portfolio of functionality to optimise support for specific kinds of products (the white part of the diagram), or on an appropriate mix of innovation and optimisation.

The curse of software maintenance will not go away overnight. Building new applications is becoming increasingly easy, integrating new applications with legacy systems is not hard, switching off peripheral legacy systems is easy, but replacing critical legacy systems with equivalent modern systems remains a risky, highly labour intensive, and expensive untertaking.

The good news is that all new software that is specified in the form of domain specific semantic models is no longer subject to classical software design degradation. The effort required for re-familiarisation with the domain no longer increases over time – the design evolves as new insights into the domain are incorporated, and the need for major exercises of re-writing software fades away.

1 comment: