26 May 2011

Ending the curse of software maintenance

Metrics collected by Capers Jones in the US show that the percentage of software professionals working on the maintenance of existing applications (legacy) rather than the development of new applications rose from 52% to 73% between 1995 and 2000. Undoubtedly part of the increase is related to Y2K projects, but the numbers have not come down after 2000; instead, Capers Jones estimated that by 2010 79% of software professionals were focused on maintenance.

The diagram above is a visualisation of Capers Jones' metrics expressed as a percentage of the total US population. Firstly, the data indicates that most business processes are now supported by software, and that opportunities to identify manual processes that can be automated are becoming increasingly rare. Secondly, since the number of software professionals required continues to rise, these figures also point to a fundamental problem relating to the notations and techniques that are used to specify software.

Writing new software requires much less effort than understanding and changing existing software. Today most applications are written by software professionals, and not by the people who have a need for the application. This is the result of using programming languages that are on par with legal language in terms of understandability by the average software user. Even expert programmers require large amounts of time to understand software written by others.

In the 1980s the focus of IT system development shifted from automation of previously manual processes to process optimisation (system integration via the use of network technologies) and to the development of more and more complex (more configurable) products and services. As a result, the required amount of source code grew substantially, and the complexity of networked software systems increasingly exceeded human cognitive limits.

The typical method of extending or modifying a software system always was and still is by proceeding in small increments: adding or changing a few handfuls of instructions, and then observing how the modified system behaves. In the 1990s software complexity reached the point where no individual could understand a typical system in all its details, but the experimental approach to software modification still seemed to work beyond this point.

The larger the system, the longer the time that is spent on re-familiarisation with the code versus the time spent on implementing a change. Eventually 80% or more of staff time is consumed by re-familiarisation and by quality assurance activities that attempt to detect and eliminate unintended side effects. Knowledge is encoded in the system, and must be re-encoded in a human brain before modified system behaviour can be encoded.

The amount of source specifications underpinning a modern business software system (such as a Logistics Management system) typically is measured in several millions of Lines of Code (LoC) – 1 million LoC fill 20,000 pages and are comparable to the complexity of 50 books of 400 pages of legal text. The pictures below are the result of tool-based analysis of several millions of lines of production-grade software code. The lines depict the dependencies between the services in a modern service oriented architecture that is no more than 5 years old.

To understand the effort spent by a software team working on the development and maintenance of a non-trivial software system, it is useful to distinguish three categories of activities that are performed by business analysts, architects, database & information designers, software developers, and testers:

  1. Creative exploration of the domain: articulation of new concepts and validation of these concepts in the form of examples and prototype development

  2. Execution of familiar rituals: making use of familiar concepts in the context of familiar structural and behavioural patterns

  3. Re-familiarisation with the encoding: reading the code, related visual models, documentation; discussions with colleagues who developed the code that must be modified; empirical tests to confirm code behaviour before proceeding with modification
Plotted over time, starting with the development of a completely new application, the effort invariably shifts towards re-familiarisation with the encoding, to the extent that more than 80% of the effort falls into this category. Once this stage is reached, changes to the software become very expensive, the number of unintended errors introduced with changes rises, and the total duration from change request to availability of new functionality increases significantly.

The diagram above illustrates the cost of re-familiarisation, which is mainly the result of the using poor notations (traditional text-oriented programming languages) and poor separation of concerns (fragmented semantic identities as a result of using file-based tools). This curse of software maintenance can only be avoided by developing intuitive domain specific notations, and by making use of denotational semantics to avoid a fragmentation of semantic identities. In a nutshell, this is motivation that led to the development of the Gmodel semantic database.

With appropriate notations and tooling, the cost of re-familiarisation with the encoding of the domain can be reduced at least by a factor of 3 to 5, sometimes perhaps even more. As a result, software can be delivered faster, with fewer errors, and at a lower cost, as illustrated below.

Once re-familiarisation is no longer the main cost driver, the organisation can either concentrate on innovation (the green part of the diagram), or on broadening the portfolio of functionality to optimise support for specific kinds of products (the white part of the diagram), or on an appropriate mix of innovation and optimisation.

The curse of software maintenance will not go away overnight. Building new applications is becoming increasingly easy, integrating new applications with legacy systems is not hard, switching off peripheral legacy systems is easy, but replacing critical legacy systems with equivalent modern systems remains a risky, highly labour intensive, and expensive untertaking.

The good news is that all new software that is specified in the form of domain specific semantic models is no longer subject to classical software design degradation. The effort required for re-familiarisation with the domain no longer increases over time – the design evolves as new insights into the domain are incorporated, and the need for major exercises of re-writing software fades away.

06 October 2010

Advanced Modelling Made Simple with Gmodel

Semantics is one of the unfortunate words that has been hijacked by everyday IT jargon. The link between the Semantic Web and semantic modelling is comparable to the link between C++ and object orientation. Just as C++ is the result of incorporating object oriented concepts into an existing programming language, the Semantic Web is the result of incorporating semantic concepts into the web technology platform that is defined by W3C standards, which include HTML, URI, HTTP etc. And likewise, just as C++ is not the nirvana of object orientation, public domain ontologies and Semantic Web tools do not cover all aspects of semantic modelling.

This week Sofismo presented the Gmodel platform for semantic modelling & language engineering at the first Workshop on Model-Driven Interoperability at the MoDELS conference in Oslo. An introductory article on Gmodel can be downloaded as part of the workshop proceedings, and corresponding slides are included below.

18 May 2010

On Pitfalls of Software Product Development

Software product development teams – and people in general - commonly over-estimate their ability to convey information in documents, diagrams, and in discussions. To make matters worse, they typically have too much faith in the validity of their personal mental models to frame the problems that need to be solved. As a result, misinterpretations often remain undetected for months, milestones are missed, and deliverables don’t meet expectations. Many failures are avoidable by recognising the role of customers - and of communication and collaboration - in software product development.

17 May 2010

Software Product Line Engineering Essentials

It is advisable to distinguish between a domain engineering process and an application development process. The application development process in turn comes in two flavours, a mature application development process and an experimental application development process.

The artefacts that we regularly produce in collaboration with various subject matter experts contain deep domain-specific knowledge (in insurance product design, automated building control, financial product design, ...). The required knowledge comes in three parts:
  1. SCIENCE: The first part of knowledge has been obtained using the scientific method.

  2. DOMAIN: The second part constitutes the accumulated wisdom from many years of working (developing heuristics) in the particular domain.

  3. MOMO: The third part relates to expertise in modelling, abstraction, and modularisation (a catalyst in the process of formalising knowledge).
In many software development projects the amount of relevant knowledge of type SCIENCE is minimal, knowledge of type DOMAIN is critical for project success, and knowledge of type MOMO is not available within the organisation.

This gets to the heart of the debate about the degree to which software development involves science!

Mature Application Development

If optimal - domain/organisation-specific - methods & tools are available to produce the artefacts that constitute the outcome of the project, then no knowledge of type SCIENCE is required, as all such knowledge is encapsulated in the software production method & tools.

Domain Engineering

If a project includes the development of domain/organisation-specific tools needed to develop the desired solution, then significant knowledge of type MOMO is required, and some knowledge of type SCIENCE may be required. Such projects are typically risky, and must be broken into separate domain engineering and application development streams to contain and incrementally eliminate the risks.

Such projects are only economically viable if the subject matter experts involved have been active participants in at least two prior software development efforts in the particular domain. Otherwise the available knowledge of type DOMAIN is insufficient for the development of domain/organisation-specific methods & tools.

Experimental Application Development

If knowledge of type DOMAIN is lacking, then the software development project is by definition based on trial-and error, and the approach must be highly agile in order to minimise waste. Luke Hohmann speaks about the need to burn the first one or two pancakes. Using science to improve such projects is impossible until the pancakes have been burnt.


If sufficient knowledge of type DOMAIN is available, then it can be combined with knowledge of type MOMO to develop optimal domain/ organisation-specific methods & tools.

The application of knowledge of type MOMO can be expressed mathematically, making use of set theory, group theory, model theory, and concepts from the theory of denotational semantics. These mathematical theories provide a solid foundation for domain engineering - which can be described as combining knowledge of type DOMAIN with knowledge of type MOMO.

Once domain engineering has been performed, application development only requires knowledge of type DOMAIN, and further domain "engineering" is only needed if the project runs into limitations of the domain/organisation-specific method & tools.

24 April 2010

The one methodology that works for all product development teams

... is unique, it is highly-context specific.

The conclusion that every product development organisation or project team should develop and follow a context-specific methodology is inescapable. The best we can do, is pay attention to deep context-specific knowledge, and to record this knowledge in modular methodology building blocks that are tied to a specific scope of applicability.

It is helpful to look at the value chain of an organisation from the perspective of a decision making process, and to think about the way in which decisions tend to be recorded, and the way in which decisions are implemented. This quickly leads to the familiar concept of a work product or artefact, and to the concepts of artefact producers and consumers.

In any non-trivial value chain there are roles and systems that produce template artefacts that are intended for completion by other roles and systems further downstream. A closer analysis of this observation leads to the conclusion that normal business operation encompasses a continuous extension and evolution of the organisational vocabulary. These vocabulary changes need to be fed back into the organisation's methodology; otherwise the methodology slowly but surely becomes less and less useful.

Distinguishing between software development and operational business is becoming more and more anachronistic. Many decisions are directly recorded in software, and they have a direct impact on the organisation and its operation.

Artefact producers routinely make the mistake of assuming that their work is done when an artefact has been handed over to downstream consumers. Communication and collaboration is never that simple.

The desired intent and the semantics of a vocabulary (and syntax)
 can only be aligned through 
extensive instantiation 
and semantic processing 
of example artefacts.

Any multi-step value chain directly leads to the need for a highly iterative product design and development process. The easier it is to
  1. define artefact templates,
  2. to instantiate artefacts,
  3. and to attach appropriate semantic processing,
the faster artefact producers and consumers are able to establish a shared understanding of the products that are being designed.

It is not uncommon for deep organisation-specific domain knowledge to be lacking. All knowledge regarding a particular process may for example be "encrypted" in programming languages, and none of the authors of the software may still be alive/available.

Attempting to reconstruct deep knowledge from old source code of a large software system can amount to economic suicide. Additionally, it is far from clear to what extent the encrypted decision making process (knowledge) is optimal or desirable in today's context. In this scenario organisations are desperate for a silver bullet methodology that can act as a substitute for lost knowledge.

Unfortunately the only medicine that can address the issue of lost or lacking knowledge head-on is a proper analysis of the context (value chain) in which the software must/should operate today.

The good news is that the required problem domain analysis techniques are available. The bad news is that there is no silver bullet that eliminates the need for having (or obtaining) an in-depth understanding of the value chain of an organisation.

26 March 2010

What Is Software?

It is interesting to see children grow up with the web and with software. Computers and hardware are becoming a non-topic. My 7-year-old son only cares about web access. His tools are Google, Wikipedia, and various other web sites. He knows more about Google's location based services than I do, and he rarely touches any of the hardware toys that earlier generations grew up with. The things that he produces on the computer are artefacts that involve several layers of pure software abstraction: emails, pictures, web sites, music, videos.

Ultimately humans will exchange less and less hardware goods or artefacts, and more and more software artefacts. And as we make software artefacts more intuitive and easy to use, it even makes sense to consider human face to face communication as a form of software - it's certainly not hardware. But the shift to the software-centric paradigm will only be complete once the users of the old hardware-centric paradigm have died out.

-- Jorn