Over the years, working on large scale software systems, I have observed that many production systems tend to approach a similar level of complexity — largely independent of the technology, architecture & programming language used. It is as if complexity “fills up” to a glass ceiling which then becomes difficult to transcend. The theory is that this glass ceiling is a sort of “cognitive horizon,” a psychological limit on what a humans can comprehend.
I would like to propose name for this dynamic: Law of Tangental Complexity. The tenet of this law may be summarised as follows: On any sufficiently large system, complexity will tend to “fill up” to a cognitive horizon and thereafter behave as a tangent to that horizon. The idiom of a horizon represents similarity to our natural field of view. The idiom of a tangent implies that complexity may move beyond our field of view by adding to it in a lateral fashion, i.e. by simply adding breadth of functionality rather than necessarily adding increasingly complicated functionality. The latter does not seem to happen once people have difficulty comprehending the complexities of a system. Breadth of functionality, however, continues to be added. The graphic below illustrates the process.
1) At the beginning of a project interfaces are clean, dependencies are few and the code base is small. Complexity is low.
2) The project is a success. More functionality, inter-dependencies and layers of abstraction are added. But complexity is still manageable, so continues to increase. It is now approaching the cognitive horizon.
3) Commercial pressures prompt more functionality to be added. Complexity now passes the cognitive horizon. Individual team members understand individual aspects of the system. Technical leads understand the whole but not all parts in full detail. Staff turn-over and attrition lead to loss of knowledge. New hires are trained. Full knowledge of the system becomes “patchy.”
4) Commercial pressures persist. Team members begin to get a sense of complexity and increasingly refrain from adding more complicated features, but continue to add breadth. Because dependencies in the system are less well understood now, technical debt results.
In any large scale system layers of abstraction and dependencies tend to be built-up to accommodate new and modified functionality until the team as a whole begins to lose view of the complexity of the system. It is suggested that the factors at play are limits imposed by Cognitive Load Theory, as well as team dynamics, not primarily by technology. For these limits to be reached, the system must typically first be successful. Any successful tool, software or otherwise, has a tendency to be progressively put to uses unindented by its original designers — until it breaks. The same dynamic bears out in large software systems. Unlike the tools of craftsmen which might snap or break, because material fatigue sets in or for similar reasons, software does not directly “break.” Software systems, by contrast, will become unmanageable or exhibit a high number of feature defects.
Once technical debt has been realised, successful project managers will direct their teams to refactor to make the system manageable once more. Another typical response is to hire more capable engineers. Initially at least this will appear to resolve the issue as a more capable team with better skills will have greater cognitive abilities and this expands the cognitive horizon. Architecture is revised, better solutions are deployed. More and enhanced functionality can be accommodated. The project team is lauded. The system becomes more successful as more customers are aquired. Within a period of time, the system returns to being unmanageable as the system complexity “fills up” to the new cognitive horizon.
The problem has actually been compounded as the system is “trapped in a pincer movement” between a now much wider cognitive horizon and the law of diminishing returns : Recruiting more exceptionally talented engineers who can cope with the cognitive horizon of the system proves less fruitful upon later iterations of this cycle. Lesser talent has no hope of approaching the complexities of the system at all. At this stage, a replacement for the system is commissioned and the life cycle of the original system ends. Usually, at this stage, technological reasons will be advanced as to why the original system is incapable of meeting new demands. One example the author has witnessed is the migration of a large scale system from Java to C++: Among other considerations, the Java garbage collection had introduced unacceptable pauses given evolving latency requirements. Other solutions might have been conceivable, including a low-pause or no-pause collector. But in truth the complexities of the “memory management” abstraction was beyond the horizon of the team.
Traditional Mitigation Strategies
- One mitigation strategy is to limit the complexity of the system and follow the Unix philosophy “Rule of Parsimony”. This is a challenge for two reasons: 1) Commercial pressures will tend to ensure that the nay-sayers are ignored and those who suggest additional featured can be accommodated are listened to. 2) Quantifying complexity in any system is inherently difficult — no consensus exists on how this should be done.
- A second strategy involves refactoring a complex system into two or more smaller subsystems with lesser complexity each. In practice this too may prove a challenge as inter-dependencies between the newly created subsystems may prompt the complexity of the whole to be greater than the sum of the complexities of the parts.
Proposed Mitigation Strategies
- One cannot change what one cannot measure. Given that software engineers work with a largely fixed toolset (IDEs, compilers, profilers, etc.), one strategy might involve equipping compilers with a metric for complexity. Just because no consensus exists is not a reason not to make an attempt at a metric. There are many measures of complexity, ranging from space, time, computational workflow and many more are conceivable. One suggested approach is to incorporate results of escape and the related shape analysis of whole program optimizing compilers. Classical examples of compilers incorporating such analysis include the Moscow ML and Stalin Scheme compilers. These don’t emit a complexity metric but could be augmented to do so. To be useful, more mainstream compilers such as GCC would also need to be modified. Equally profilers might emit a metric on number of threads in concurrent programs and the ways in which synchronisation objects connect the workflow of such threads.
- If layering abstractions adds complexity, then having malleable abstraction becomes paramount in managing complexity. Malleable abstraction is a synonym for meta-programming. Yet few programming languages are optimized for meta-programming. Indeed, the orthogonality of features in most programming languages collapses when meta programming is introduced. This has generally made meta-programming the exclusive playground of elite programmers and has given the technique a reputation as being intractable. One classic example of the breakdown in feature orthogonality in the face of meta-programming techniques is C++. Indeed there is but one family of languages that has stood the test of time here: the Lisp family of languages. Lisp is optimized for macro or meta programming in a way that no other programming language is. Traditionally this has forced the programmer to work in an abstract syntax parse tree, the core data structure of a compiler, adding to the unpopularity of the language. More recent advances have removed this constraint — see the Julia programming language for further reading on the subject.
Comment is invited.