Writing Software With Infinite Money

When talking about engineering projects, people often use the classic “Speed, low cost, quality -> pick 2”. There is a lot of truth to it, and in many companies, pressures to lower delays and costs are often far more powerful than pressures to drive up quality. Most executives only want the product to be good enough to survive, and even in life-critical industries such as air travel, sometimes execs don’t even care about the survive part, but I digress.

One interesting question to ask yourself though, is, if you were granted the opportunity to pick speed and quality, if you were granted infinite resources to write software with the sole objective of delivering fast a perfect product… Would you even know how to do it?
If the business exec villains were not there to corrupt your craft, if money was infinite on your project, would you know how to write good software fast, in a life-critical industry?
Turns out, in my case, until recently, the answer was no. Of course I would introduce more process, do more QA and testing, beef up CI etc but could I guarantee that my life-critical software would be bug-free? I most likely could not.

When I realized this, I started to do some research, to figure out, how is software written in life-critical industries in the first place? The French ecosystem does not lack defense companies in complex industries, and I started to research, how do people approach software engineering in those environments.

A few days into my research, I came across this presentation, written by people involved in the software tooling used to write the Rafale fighter jet source code.

One key point associates bugs to lack of determinism, which is something I knew kind of instinctively, but without having yet put precise words on it. I had noticed many times that the more entropy increases in your source code, the less deterministic your system is, and the more you have unwanted/unplanned behaviors (bugs). That is why we heavily use linters of all sorts at Shares to keep source code entropy to a minimum.

A second key point of the presentation is that research in software engineering is alive and well, and that plenty of people work on improving our tooling (think of improvements to type systems over the years – ex: typescript -), even though many of those tools don’t make it to the startup world.

A third key point of this presentation, on slide 19, that struck my mind has been a mental paradigm shift for me. It is mentioned that turing-complete languages are too hard to check. Even before trying to define what “checking” means, I always assumed that the emergence of turing-complete languages represented progress, and that it was the direction of history. I had never connected the dots with the entropy problem I knew about, that maybe the capabilities of turing-complete languages were also their curse, opening the door to constant entropy increases that make it harder and harder to stay deterministic.
I don’t think that Turing-complete languages are bad, but are they the future? I’m not so sure anymore.

So what does “checking” means anyway, when it comes to software? Over the past 2 decades, theories have emerged around the concept of “formal checking”, which consists in proving that a given software complies with a set of invariants.
Those invariants depend on your use case, for example:

  • your money balance should never become negative no matter what
  • your aircraft should never stall
  • train braking power should never be activated above a certain threshold etc

No matter what happens in those systems, no matter when, no matter the order of events.
Concretely, formal checking consists in defining those invariants with a particular formalism, and then using a solver to explore all possible states (either via brute force or via symbolic methods, depending on the specifications language used).

There are 2 important points to note:

  • in order to keep the problem tractable, the specifications language used has to be very restricted and turing-complete languages do not fit the bill
  • you can then use a specific language for writing pseudocode and check that pseudocode, and use another more common language to implement the algorithm, but then there is always a risk that your implementation does not exactly implement the pseudocode (in which case the results of formal checking may not apply anymore to your system).

One popular tool to perform formal checking is TLA+. You write your algorithm using TLA+ specification language as exhaustively-testable pseudocode. Then its solver can demonstrate whether the algorithm is compliant with the set of invariants.
From wikipedia:
“Since TLA+ specifications are written in a formal language, they are amenable to finite model checking. The model checker finds all possible system behaviours up to some number of execution steps, and examines them for violations of desired invariance properties such as safety and liveness. TLA+ specifications use basic set theory to define safety (bad things won’t happen) and temporal logic to define liveness (good things eventually happen).”

One of the use cases I prefer is the one of AWS S3 and how TLA+ is used there to ensure data integrity and consistency across their distributed infrastructure, the paper can be found here.
AWS S3 is in my opinion one of the most foundational cloud services of the 21st century so far, storage is now completely abstracted away and levels of reliability at AWS S3 make it a true utility like tap water of electricity.

The interesting point for me is that many AWS competitors provide also cloud object storage, but I have insider information from at least 2 competitors in the EU that confirms formal methods are not used there, and not even on the radar, meaning that their object storage services have a reliability nowhere near the AWS level.
Technology in software engineering is advancing fast, and most people just accept the dominance of some US players as facts of life, without trying to understand it.
The cold hard truth is that those dominant players are more technology advanced than the others. Acquiring those new technologies should be the priority of any reasonably sized tech company.
But, you don’t know what you don’t know, and in a lot of startups, teams do not think hard enough about this, and many companies still don’t understand that they’re only as good as their software.

To conclude, excuses for badly written software may be legit and good, but at the end of the day, they’re just that, excuses. Were you provided infinite money, would you even know how to write good software fast?
Once you answer that question, then you can truly begin the improvement journey, and implement some of the ideas that make fighter jets fly.

Leave a comment