The software industry handles failure poorly.

The startup world venerates failure. Well, sort of. It venerates business failure, so long as it happens in a way that the investors approved of. The VCs will manage the careers of the founders, and no one else involved with the failing company (i.e. the people who lose jobs) is considered to matter. “Failure is awesome!” “Go out and fail!” There’s plenty of talk among technology people about “failing fast”, and yet what I’ve observed is that our industry handles failure quite poorly.

Of course, the software industry is full of failure. In the corporate context, the term failure is somewhat subjective, but I’ve heard that 60 to 80 percent of software projects fail. I don’t doubt that number at all. I’m going to put it higher: I’d guess that 97 percent of software efforts fail. Admittedly, my standard of success is high. How do I get to 97%? Well, about 40 percent of projects are cancelled before they deliver anything. That’s a clear case of failure (“total hard failure”). Another 30 percent can be salvaged and reformed into something, but underperform the expectations set based on their headcount and resources, and are written off as embarrassments (“partial hard failure”).

In other words, 70 out of 100 software projects are hard failures. That’s not controversial. Of the remaining 30, half of those (or 15% of the whole) are soft failures: well-made, good products that go unused, or are regarded as failures, for purely political reasons. In other words, there was no technical failure, but the product did not succeed politically. The software itself was just fine– possibly, quite excellent– it’s not going to get an engineer promoted. This leaves 15% that could be seen as successes. Of them, four-fifths (12% of the whole) devolve quickly into disliked legacy monsters that, even though they get launched and become critical to the business, are viewed with enough disdain that, while managers of the projects might get promoted, the engineers are under constant criticism for the shortcomings of the system (which may have more to do with time constraints than engineering shortfalls). This leaves only 3% of software projects that are able to (a) solve the problem, (b) do so using a resource budget deemed acceptable, (c) succeed politically, and (d) continue to be well-regarded enough to make the case for an engineer’s promotion. The other 97% of projects? They add some value to the business but, from a careerist perspective, they’re a total waste of time, because they don’t confer the political capital that would give their engineers the credibility to work on better, harder, more critical problems.

Given the low success rate of software projects, even though many of those causes of failure aren’t the engineers’ fault, it’s probably not surprising that this industry has such a high burnout rate. Our success rate is comparable to what one might expect out of an R&D lab. No one holds a researcher in low regard if many of her ideas fail; if that doesn’t happen, that often means that her aims aren’t ambitious enough. Sub-50% success rates are expected in R&D, with the understanding that the successes will pay for the exploration costs (a polite term for the costs of the failures). We, however, are held responsible and sometimes fired for software project failures, which are presented as never acceptable.

The 70% that I call hard failures can lead to project cancellations and, in bad economic times, lost jobs. So can the 15% that are soft failures (i.e. working software rejected for political reasons) although it is less common. Successfully completing something that goes unused will not get a person fired, but it does not protect a person from layoffs. The 12% that evolve into hated legacy assets rarely end jobs. In fact, they create jobs in the company, but they’re usually undesirable maintenance jobs. Those projects turn out well for managers but, for engineers, it’s rare that they make a case for promotion. Managers can say that they “successfully delivered” these projects and it’s not really a lie, but the programmers are expected to stick around and maintain them. Taking this all in, one sees that the projects that can really make a case for someone as a programmer require a large number of things to go right, and almost nothing to go wrong.

With the battle scars that come with age, programmers tend to develop prejudices that make little sense to anyone else. I’ve heard people say that it’s “impossible” to write production code in a dynamically typed language. It’s not. I personally prefer static typing, but high-quality production code is written in dynamic languages every day. I’ve heard others bash practices and tools based on previous experiences with software failure. It’s a multiple-cause attribution behavior that doesn’t make much sense.

When negative outcomes are rare, it’s sensible to attribute them to all causes. Let’s consider traffic accidents. On a drive, the positive outcome (no collision) is common and the negative one is very rare. Most traffic accidents have multiple causes: the driver was exhausted, it was snowing, and the intersection was poorly designed. It’s rare that such accidents happen because one thing goes wrong; it usually takes multiple things going wrong at the same time. The conclusions that are drawn, however, are valid: driving while exhausted is dangerous, snow makes driving more dangerous, and badly-designed intersections are dangerous. Blaming all causes makes sense.

However, when negative outcomes are common, it’s often because a single cause can ruin the whole thing. This is the nature of software. True successes are damn rare, most projects fail, and small things can cause great work to be all for nought. This is worsened by the fact that, in the corporate theater, people in power can easily obfuscate the causes of things going wrong (or, to put it more bluntly, shift blame). In the example of the traffic accident, there were multiple causes that were genuinely involved. In a corporate failure, only one cause is necessary to make it happen, but the executives can generate multiple causes that might have been involved, and they’ll frequently do so, for obvious self-serving reasons.

Let’s say, for a concrete example, that a software project is started in Python, because it’s what the engineering team knows and likes using. (It’s not my favorite language, but it works.) Four months in, there’s an executive shake-up and a new CTO comes in with a strong pro-Java bias, because it’s what he knows. In addition to this technical bias, he needs to Make Decisions on his first day, “to prove he’s serious”, so he waltzes over to one of the teams under him (this team) and, without taking time to determine why the team chose the tools that it did, he says, “There’s no way that Python can scale; use Java instead”.

What happens? Over time, the strong Python engineers leave, while the weak ones stay and begrudgingly switch over to Java. Additionally, the language overhaul turns out to be more complicated than expected and, due to time constraints, it can’t be completed. So the project ends up being a Python/Java hybrid app with poor communication between the two halves. The app is delivered, but it’s slow and buggy because the good Python engineers have left and the weak ones weren’t able to write performant code in any language. The real cause of the failure? A CTO who interfered with a functioning team. The official version of events? The CTO was right; Python simply “cannot scale”. It can’t be the idiot CTO that killed that project; Guido von Rossum himself did it.

What’s worst isn’t just that this pattern of behavior exists, but how quick software engineers are to buy into the “official” story, and attribute negative outcomes to the technologies that their projects relied upon. I’ve met so many software engineers who absolutely hate technologies that weren’t to blame, at all, for the project failures attributed to them. Did the project really fail because of Python? Or because test-driven development just takes too damn much time? Or because XML is a hideous mess of a standard (which it is, but that’s almost always irrelevant)? I strongly doubt it. In almost all cases, I’d guess that the project failed because of managerial or political factors (read: executive incompetence) and that technologies were simply blamed because the “official” narratives get written by (no surprise) executives. Programmers and technologies have been a dumping ground for executives’ failures for decades and, as bad as that is, it’s even worse that so many programmers are stupid enough to believe these official narratives– and go on to future jobs and spout idiocies like “Python cannot scale.”

Software project failures are painful, especially for programmers. Although failure is common in our industry, we don’t really get permission to fail. Companies can’t fire everyone when a software project fails, because even the most evil companies don’t want that much turnover. Still, if a project lands in hard failure, it’s pretty much a guarantee that at least one person has to lose a job, and the manager’s going to make sure that it isn’t him. Soft failures don’t necessarily end jobs, but they stall careers, make people nervous, and push people to quit.

After a few years in this industry, everyone has enough experience with failed projects (including, again, successful projects that failed politically or that devolved into mediocrity for political reasons, especially including feature creep) to have battle scars, and that gives each of us a long list of technologies and patterns that we find ourselves hating. Don’t get me wrong, either: many of our bugbears are legitimately hideous. I can’t stand the singleton directories (e.g. “com”) of Java projects, the broken handling of loop variables in Python closures, or the monkey-patch facilities of Ruby. These are all ugly things, but I think it’s extremely rare that any one of them ever single-handedly caused a project to fail.

To me, it’s weird and inconsistent. The startup world venerates failure, but only when founders do it. A VC-backed founder who steers a business into a reef is lauded for taking the risk and either “acqui-hired” or given another company. A programmer who fails (or, more often, who has the misfortune of working on a project that fails, which is often not her fault) is just regarded as a loser. Programmers are expected to have career narratives of one success after another. This is inconsistent with the fact that about 40 percent of software projects fail totally, and another 57 percent fall into murky non-success or uninspiring mediocrity. Worse yet, programmers often don’t know (due to managerial secrecy and the pervasive cover-up culture) the real reasons why their projects failed and, even if they do know, can’t speak the truth in public or in a professional context such as a job interview (where “bad-mouthing” an ex-manager is taken to be a crime as serious as murder or rape). Thus, they come up with a myriad of faulty explanations for why things went badly on previous projects. “What we learned… is that Python cannot scale.” That’s not true at all, but it’s more socially acceptable to say that on a job interview than to say the truth for most values of “the truth” (which involve frank incompetence at executive levels).

Writ large, these non-truths get repeated often enough that they generate a cloud of tech hate. Every language, every framework, every tool, just sucks. Python “can’t scale” and Java programmers “are all morons” and C “is insecure and dangerous” and Haskell “is impossible to hire for”. That’s the narrative. Some tools do suck, and far more often a tool is inappropriate to the purpose (e.g. Java when one wants rapid development, or Python for low-latency systems) to which it’s put. However, nuance is lost in the cloud of embittered tech-bashing, and the image that it projects of our work as a whole (and of us) is that most of what we build is pure garbage. Is it any wonder, with such a culture, that programming remains a low-status profession?

Michael O Church Archive

Just another WordPress site

The software industry handles failure poorly.