Systems Engineering

Preventing Failures in Systems Engineering

Most systems engineering failures, even those in new, one-of-a-kind high-tech systems, do not involve previously unknown phenomena, or black swans. As appealing as the black swan metaphor is, the real reasons for most failures are, in fact, rather prosaic and predictable white swans. Sorenson and Marais identified a set of 22 “real reasons”, ranging from “conducted poor requirements engineering” to “created inadequate procedures”.

In complex development projects, neither traditional engineering management data nor big data analysis is able to consistently and accurately pinpoint issues. Failures, even though they may appear simple in retrospect, are often the result of a complex network of decisions, many of them locally and temporally rational. Modeling and predicting such complex events requires complex models; and complex models need large amounts of historical data to give accurate predictions and insights; cutting-edge projects do not have an abundance of historical data. More (and possibly better) data are needed.

In these complex scenarios, there is a potential to augment existing engineering management data with “wisdom of the crowd” information. Wisdom of the crowd (WoC) refers to the hypothesis that the collective opinion of a large number of non-experts (e.g., a novice engineer) is a better signal to the health of a project than the opinion of a single expert (say, an experienced manager). For instance, employees give their best assessment of the timeline and budget of a project, given their knowledge of the system. These assessments are then combined with a machine learning algorithm to predict the probability that the project will be successful or the system will fail. Unfortunately, with bonuses and salaries depending on contracts, it is challenging to ensure employees truthfully report their project estimates.

Our effort leverages two main ideas: (1) risk assessment based on the “real reasons” for systems engineering failures, and (2) combining existing data with Wisdom of the Crowd (WoC) indicators to uncover the correlations between various (unreliable) traditional and crowd-derived measures and the measurable outcome (success, failure, or delay).

Click here to see a list of the questions we use as crowd signals.

Understanding Failures in Systems Engineering

The long-term objective of this research is to contribute to the fundamental understanding of why failures occur in systems engineering. Despite our best efforts, systems engineering continues to fail, and the rate of failures shows no sign of decreasing. Current approaches based on methods, tools and processes are not working. We need to do more than propose better processes—we need a foundational basis that is informative and can be adapted to a broad range of circumstances and industries (e.g., mining, oil and gas, chemical, and aerospace) to guide design and operational choices that help prevent or mitigate failures.

The overarching goal for this research project is to improve understanding of systems engineering project failures, such as budget and schedule exceedances and quality concerns, to help mitigate them.

Goal 1: Identify problems in past accidents and project failures and link these problems to remediation measures.

Summary: We studied 63 systems engineering failures (30 accidents + 33 project failures) and identified similar causes, such as problems with testing, project management, and requirements engineering. We subsequently linked these causes to recommendations experts made in accident reports and built a “cause-recommendation” network, which we are deploying in JavaScript—stay tuned for us to post the network on this website.

Goal 2: Gather data from large-scale aerospace companies on how they conduct systems engineering, with a focus on how they study and report their own systems engineering failures and relate these findings to our study on past project failures.

Summary: We surveyed and interviewed systems engineers at 2 large-scale aerospace companies, and found that their failure investigations focus on technical problems that systems engineers can solve. In our investigation of 63 systems engineering failures, we found causes related to technical problems, as well as “people” problems—mismanagement, human factors problems, and others. In their internal reports, these aerospace companies did not report any “people” problems.

Goal 3: Determine whether students who have taken more systems engineering courses at Purdue are better able to identify “pain points” in systems engineering as identified by accident investigators.

Summary: In our study of 63 systems engineering failures, we found “pain points”-- decision points made before the accident took place that accident-causation experts identified as significantly contributing to the accident. We wrote 8 survey questions on some of these pain points and distributed them to the current Purdue AAE students. We are currently analyzing results from this survey to determine whether performance in systems engineering classes correlates with the students’ ability to identify these pain points.

Click here to view the data we have generated in our research.

Click here to view and use our failure analysis tool. Please give it a try, and let us know what you think!